1e5dd7070Spatrick=============================================================== 2e5dd7070SpatrickTutorial for building tools using LibTooling and LibASTMatchers 3e5dd7070Spatrick=============================================================== 4e5dd7070Spatrick 5e5dd7070SpatrickThis document is intended to show how to build a useful source-to-source 6e5dd7070Spatricktranslation tool based on Clang's `LibTooling <LibTooling.html>`_. It is 7e5dd7070Spatrickexplicitly aimed at people who are new to Clang, so all you should need 8e5dd7070Spatrickis a working knowledge of C++ and the command line. 9e5dd7070Spatrick 10e5dd7070SpatrickIn order to work on the compiler, you need some basic knowledge of the 11e5dd7070Spatrickabstract syntax tree (AST). To this end, the reader is encouraged to 12e5dd7070Spatrickskim the :doc:`Introduction to the Clang 13e5dd7070SpatrickAST <IntroductionToTheClangAST>` 14e5dd7070Spatrick 15e5dd7070SpatrickStep 0: Obtaining Clang 16e5dd7070Spatrick======================= 17e5dd7070Spatrick 18e5dd7070SpatrickAs Clang is part of the LLVM project, you'll need to download LLVM's 19e5dd7070Spatricksource code first. Both Clang and LLVM are in the same git repository, 20e5dd7070Spatrickunder different directories. For further information, see the `getting 21e5dd7070Spatrickstarted guide <https://llvm.org/docs/GettingStarted.html>`_. 22e5dd7070Spatrick 23e5dd7070Spatrick.. code-block:: console 24e5dd7070Spatrick 25e5dd7070Spatrick cd ~/clang-llvm 26e5dd7070Spatrick git clone https://github.com/llvm/llvm-project.git 27e5dd7070Spatrick 28e5dd7070SpatrickNext you need to obtain the CMake build system and Ninja build tool. 29e5dd7070Spatrick 30e5dd7070Spatrick.. code-block:: console 31e5dd7070Spatrick 32e5dd7070Spatrick cd ~/clang-llvm 33e5dd7070Spatrick git clone https://github.com/martine/ninja.git 34e5dd7070Spatrick cd ninja 35e5dd7070Spatrick git checkout release 36e5dd7070Spatrick ./bootstrap.py 37e5dd7070Spatrick sudo cp ninja /usr/bin/ 38e5dd7070Spatrick 39e5dd7070Spatrick cd ~/clang-llvm 40e5dd7070Spatrick git clone git://cmake.org/stage/cmake.git 41e5dd7070Spatrick cd cmake 42e5dd7070Spatrick git checkout next 43e5dd7070Spatrick ./bootstrap 44e5dd7070Spatrick make 45e5dd7070Spatrick sudo make install 46e5dd7070Spatrick 47e5dd7070SpatrickOkay. Now we'll build Clang! 48e5dd7070Spatrick 49e5dd7070Spatrick.. code-block:: console 50e5dd7070Spatrick 51e5dd7070Spatrick cd ~/clang-llvm 52e5dd7070Spatrick mkdir build && cd build 53e5dd7070Spatrick cmake -G Ninja ../llvm -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" -DLLVM_BUILD_TESTS=ON # Enable tests; default is off. 54e5dd7070Spatrick ninja 55e5dd7070Spatrick ninja check # Test LLVM only. 56e5dd7070Spatrick ninja clang-test # Test Clang only. 57e5dd7070Spatrick ninja install 58e5dd7070Spatrick 59e5dd7070SpatrickAnd we're live. 60e5dd7070Spatrick 61e5dd7070SpatrickAll of the tests should pass. 62e5dd7070Spatrick 63e5dd7070SpatrickFinally, we want to set Clang as its own compiler. 64e5dd7070Spatrick 65e5dd7070Spatrick.. code-block:: console 66e5dd7070Spatrick 67e5dd7070Spatrick cd ~/clang-llvm/build 68e5dd7070Spatrick ccmake ../llvm 69e5dd7070Spatrick 70e5dd7070SpatrickThe second command will bring up a GUI for configuring Clang. You need 71e5dd7070Spatrickto set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on 72e5dd7070Spatrickadvanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to 73e5dd7070Spatrick``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to 74e5dd7070Spatrickconfigure, then ``'g'`` to generate CMake's files. 75e5dd7070Spatrick 76e5dd7070SpatrickFinally, run ninja one last time, and you're done. 77e5dd7070Spatrick 78e5dd7070SpatrickStep 1: Create a ClangTool 79e5dd7070Spatrick========================== 80e5dd7070Spatrick 81e5dd7070SpatrickNow that we have enough background knowledge, it's time to create the 82e5dd7070Spatricksimplest productive ClangTool in existence: a syntax checker. While this 83e5dd7070Spatrickalready exists as ``clang-check``, it's important to understand what's 84e5dd7070Spatrickgoing on. 85e5dd7070Spatrick 86e5dd7070SpatrickFirst, we'll need to create a new directory for our tool and tell CMake 87e5dd7070Spatrickthat it exists. As this is not going to be a core clang tool, it will 88e5dd7070Spatricklive in the ``clang-tools-extra`` repository. 89e5dd7070Spatrick 90e5dd7070Spatrick.. code-block:: console 91e5dd7070Spatrick 92e5dd7070Spatrick cd ~/clang-llvm 93e5dd7070Spatrick mkdir clang-tools-extra/loop-convert 94e5dd7070Spatrick echo 'add_subdirectory(loop-convert)' >> clang-tools-extra/CMakeLists.txt 95e5dd7070Spatrick vim clang-tools-extra/loop-convert/CMakeLists.txt 96e5dd7070Spatrick 97e5dd7070SpatrickCMakeLists.txt should have the following contents: 98e5dd7070Spatrick 99e5dd7070Spatrick:: 100e5dd7070Spatrick 101e5dd7070Spatrick set(LLVM_LINK_COMPONENTS support) 102e5dd7070Spatrick 103e5dd7070Spatrick add_clang_executable(loop-convert 104e5dd7070Spatrick LoopConvert.cpp 105e5dd7070Spatrick ) 106e5dd7070Spatrick target_link_libraries(loop-convert 107e5dd7070Spatrick PRIVATE 108a9ac8606Spatrick clangAST 109e5dd7070Spatrick clangASTMatchers 110a9ac8606Spatrick clangBasic 111a9ac8606Spatrick clangFrontend 112a9ac8606Spatrick clangSerialization 113a9ac8606Spatrick clangTooling 114e5dd7070Spatrick ) 115e5dd7070Spatrick 116e5dd7070SpatrickWith that done, Ninja will be able to compile our tool. Let's give it 117e5dd7070Spatricksomething to compile! Put the following into 118e5dd7070Spatrick``clang-tools-extra/loop-convert/LoopConvert.cpp``. A detailed explanation of 119e5dd7070Spatrickwhy the different parts are needed can be found in the `LibTooling 120e5dd7070Spatrickdocumentation <LibTooling.html>`_. 121e5dd7070Spatrick 122e5dd7070Spatrick.. code-block:: c++ 123e5dd7070Spatrick 124e5dd7070Spatrick // Declares clang::SyntaxOnlyAction. 125e5dd7070Spatrick #include "clang/Frontend/FrontendActions.h" 126e5dd7070Spatrick #include "clang/Tooling/CommonOptionsParser.h" 127e5dd7070Spatrick #include "clang/Tooling/Tooling.h" 128e5dd7070Spatrick // Declares llvm::cl::extrahelp. 129e5dd7070Spatrick #include "llvm/Support/CommandLine.h" 130e5dd7070Spatrick 131e5dd7070Spatrick using namespace clang::tooling; 132e5dd7070Spatrick using namespace llvm; 133e5dd7070Spatrick 134e5dd7070Spatrick // Apply a custom category to all command-line options so that they are the 135e5dd7070Spatrick // only ones displayed. 136e5dd7070Spatrick static llvm::cl::OptionCategory MyToolCategory("my-tool options"); 137e5dd7070Spatrick 138e5dd7070Spatrick // CommonOptionsParser declares HelpMessage with a description of the common 139e5dd7070Spatrick // command-line options related to the compilation database and input files. 140e5dd7070Spatrick // It's nice to have this help message in all tools. 141e5dd7070Spatrick static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage); 142e5dd7070Spatrick 143e5dd7070Spatrick // A help message for this specific tool can be added afterwards. 144e5dd7070Spatrick static cl::extrahelp MoreHelp("\nMore help text...\n"); 145e5dd7070Spatrick 146e5dd7070Spatrick int main(int argc, const char **argv) { 147a9ac8606Spatrick auto ExpectedParser = CommonOptionsParser::create(argc, argv, MyToolCategory); 148a9ac8606Spatrick if (!ExpectedParser) { 149a9ac8606Spatrick // Fail gracefully for unsupported options. 150a9ac8606Spatrick llvm::errs() << ExpectedParser.takeError(); 151a9ac8606Spatrick return 1; 152a9ac8606Spatrick } 153a9ac8606Spatrick CommonOptionsParser& OptionsParser = ExpectedParser.get(); 154e5dd7070Spatrick ClangTool Tool(OptionsParser.getCompilations(), 155e5dd7070Spatrick OptionsParser.getSourcePathList()); 156e5dd7070Spatrick return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>().get()); 157e5dd7070Spatrick } 158e5dd7070Spatrick 159e5dd7070SpatrickAnd that's it! You can compile our new tool by running ninja from the 160e5dd7070Spatrick``build`` directory. 161e5dd7070Spatrick 162e5dd7070Spatrick.. code-block:: console 163e5dd7070Spatrick 164e5dd7070Spatrick cd ~/clang-llvm/build 165e5dd7070Spatrick ninja 166e5dd7070Spatrick 167e5dd7070SpatrickYou should now be able to run the syntax checker, which is located in 168e5dd7070Spatrick``~/clang-llvm/build/bin``, on any source file. Try it! 169e5dd7070Spatrick 170e5dd7070Spatrick.. code-block:: console 171e5dd7070Spatrick 172e5dd7070Spatrick echo "int main() { return 0; }" > test.cpp 173e5dd7070Spatrick bin/loop-convert test.cpp -- 174e5dd7070Spatrick 175e5dd7070SpatrickNote the two dashes after we specify the source file. The additional 176e5dd7070Spatrickoptions for the compiler are passed after the dashes rather than loading 177e5dd7070Spatrickthem from a compilation database - there just aren't any options needed 178e5dd7070Spatrickright now. 179e5dd7070Spatrick 180e5dd7070SpatrickIntermezzo: Learn AST matcher basics 181e5dd7070Spatrick==================================== 182e5dd7070Spatrick 183e5dd7070SpatrickClang recently introduced the :doc:`ASTMatcher 184e5dd7070Spatricklibrary <LibASTMatchers>` to provide a simple, powerful, and 185e5dd7070Spatrickconcise way to describe specific patterns in the AST. Implemented as a 186e5dd7070SpatrickDSL powered by macros and templates (see 187e5dd7070Spatrick`ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're 188e5dd7070Spatrickcurious), matchers offer the feel of algebraic data types common to 189e5dd7070Spatrickfunctional programming languages. 190e5dd7070Spatrick 191e5dd7070SpatrickFor example, suppose you wanted to examine only binary operators. There 192e5dd7070Spatrickis a matcher to do exactly that, conveniently named ``binaryOperator``. 193e5dd7070SpatrickI'll give you one guess what this matcher does: 194e5dd7070Spatrick 195e5dd7070Spatrick.. code-block:: c++ 196e5dd7070Spatrick 197e5dd7070Spatrick binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0)))) 198e5dd7070Spatrick 199e5dd7070SpatrickShockingly, it will match against addition expressions whose left hand 200e5dd7070Spatrickside is exactly the literal 0. It will not match against other forms of 201e5dd7070Spatrick0, such as ``'\0'`` or ``NULL``, but it will match against macros that 202e5dd7070Spatrickexpand to 0. The matcher will also not match against calls to the 203e5dd7070Spatrickoverloaded operator ``'+'``, as there is a separate ``operatorCallExpr`` 204e5dd7070Spatrickmatcher to handle overloaded operators. 205e5dd7070Spatrick 206e5dd7070SpatrickThere are AST matchers to match all the different nodes of the AST, 207e5dd7070Spatricknarrowing matchers to only match AST nodes fulfilling specific criteria, 208e5dd7070Spatrickand traversal matchers to get from one kind of AST node to another. For 209e5dd7070Spatricka complete list of AST matchers, take a look at the `AST Matcher 210e5dd7070SpatrickReferences <LibASTMatchersReference.html>`_ 211e5dd7070Spatrick 212e5dd7070SpatrickAll matcher that are nouns describe entities in the AST and can be 213e5dd7070Spatrickbound, so that they can be referred to whenever a match is found. To do 214e5dd7070Spatrickso, simply call the method ``bind`` on these matchers, e.g.: 215e5dd7070Spatrick 216e5dd7070Spatrick.. code-block:: c++ 217e5dd7070Spatrick 218e5dd7070Spatrick variable(hasType(isInteger())).bind("intvar") 219e5dd7070Spatrick 220e5dd7070SpatrickStep 2: Using AST matchers 221e5dd7070Spatrick========================== 222e5dd7070Spatrick 223e5dd7070SpatrickOkay, on to using matchers for real. Let's start by defining a matcher 224e5dd7070Spatrickwhich will capture all ``for`` statements that define a new variable 225e5dd7070Spatrickinitialized to zero. Let's start with matching all ``for`` loops: 226e5dd7070Spatrick 227e5dd7070Spatrick.. code-block:: c++ 228e5dd7070Spatrick 229e5dd7070Spatrick forStmt() 230e5dd7070Spatrick 231e5dd7070SpatrickNext, we want to specify that a single variable is declared in the first 232e5dd7070Spatrickportion of the loop, so we can extend the matcher to 233e5dd7070Spatrick 234e5dd7070Spatrick.. code-block:: c++ 235e5dd7070Spatrick 236e5dd7070Spatrick forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl())))) 237e5dd7070Spatrick 238e5dd7070SpatrickFinally, we can add the condition that the variable is initialized to 239e5dd7070Spatrickzero. 240e5dd7070Spatrick 241e5dd7070Spatrick.. code-block:: c++ 242e5dd7070Spatrick 243e5dd7070Spatrick forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( 244e5dd7070Spatrick hasInitializer(integerLiteral(equals(0)))))))) 245e5dd7070Spatrick 246e5dd7070SpatrickIt is fairly easy to read and understand the matcher definition ("match 247e5dd7070Spatrickloops whose init portion declares a single variable which is initialized 248e5dd7070Spatrickto the integer literal 0"), but deciding that every piece is necessary 249e5dd7070Spatrickis more difficult. Note that this matcher will not match loops whose 250e5dd7070Spatrickvariables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of 251e5dd7070Spatrickzero besides the integer 0. 252e5dd7070Spatrick 253e5dd7070SpatrickThe last step is giving the matcher a name and binding the ``ForStmt`` 254e5dd7070Spatrickas we will want to do something with it: 255e5dd7070Spatrick 256e5dd7070Spatrick.. code-block:: c++ 257e5dd7070Spatrick 258e5dd7070Spatrick StatementMatcher LoopMatcher = 259e5dd7070Spatrick forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( 260e5dd7070Spatrick hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); 261e5dd7070Spatrick 262e5dd7070SpatrickOnce you have defined your matchers, you will need to add a little more 263e5dd7070Spatrickscaffolding in order to run them. Matchers are paired with a 264e5dd7070Spatrick``MatchCallback`` and registered with a ``MatchFinder`` object, then run 265e5dd7070Spatrickfrom a ``ClangTool``. More code! 266e5dd7070Spatrick 267e5dd7070SpatrickAdd the following to ``LoopConvert.cpp``: 268e5dd7070Spatrick 269e5dd7070Spatrick.. code-block:: c++ 270e5dd7070Spatrick 271e5dd7070Spatrick #include "clang/ASTMatchers/ASTMatchers.h" 272e5dd7070Spatrick #include "clang/ASTMatchers/ASTMatchFinder.h" 273e5dd7070Spatrick 274e5dd7070Spatrick using namespace clang; 275e5dd7070Spatrick using namespace clang::ast_matchers; 276e5dd7070Spatrick 277e5dd7070Spatrick StatementMatcher LoopMatcher = 278e5dd7070Spatrick forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( 279e5dd7070Spatrick hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); 280e5dd7070Spatrick 281e5dd7070Spatrick class LoopPrinter : public MatchFinder::MatchCallback { 282e5dd7070Spatrick public : 283e5dd7070Spatrick virtual void run(const MatchFinder::MatchResult &Result) { 284e5dd7070Spatrick if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop")) 285e5dd7070Spatrick FS->dump(); 286e5dd7070Spatrick } 287e5dd7070Spatrick }; 288e5dd7070Spatrick 289e5dd7070SpatrickAnd change ``main()`` to: 290e5dd7070Spatrick 291e5dd7070Spatrick.. code-block:: c++ 292e5dd7070Spatrick 293e5dd7070Spatrick int main(int argc, const char **argv) { 294a9ac8606Spatrick auto ExpectedParser = CommonOptionsParser::create(argc, argv, MyToolCategory); 295a9ac8606Spatrick if (!ExpectedParser) { 296a9ac8606Spatrick // Fail gracefully for unsupported options. 297a9ac8606Spatrick llvm::errs() << ExpectedParser.takeError(); 298a9ac8606Spatrick return 1; 299a9ac8606Spatrick } 300a9ac8606Spatrick CommonOptionsParser& OptionsParser = ExpectedParser.get(); 301e5dd7070Spatrick ClangTool Tool(OptionsParser.getCompilations(), 302e5dd7070Spatrick OptionsParser.getSourcePathList()); 303e5dd7070Spatrick 304e5dd7070Spatrick LoopPrinter Printer; 305e5dd7070Spatrick MatchFinder Finder; 306e5dd7070Spatrick Finder.addMatcher(LoopMatcher, &Printer); 307e5dd7070Spatrick 308e5dd7070Spatrick return Tool.run(newFrontendActionFactory(&Finder).get()); 309e5dd7070Spatrick } 310e5dd7070Spatrick 311e5dd7070SpatrickNow, you should be able to recompile and run the code to discover for 312e5dd7070Spatrickloops. Create a new file with a few examples, and test out our new 313e5dd7070Spatrickhandiwork: 314e5dd7070Spatrick 315e5dd7070Spatrick.. code-block:: console 316e5dd7070Spatrick 317e5dd7070Spatrick cd ~/clang-llvm/llvm/llvm_build/ 318e5dd7070Spatrick ninja loop-convert 319e5dd7070Spatrick vim ~/test-files/simple-loops.cc 320e5dd7070Spatrick bin/loop-convert ~/test-files/simple-loops.cc 321e5dd7070Spatrick 322e5dd7070SpatrickStep 3.5: More Complicated Matchers 323e5dd7070Spatrick=================================== 324e5dd7070Spatrick 325e5dd7070SpatrickOur simple matcher is capable of discovering for loops, but we would 326e5dd7070Spatrickstill need to filter out many more ourselves. We can do a good portion 327e5dd7070Spatrickof the remaining work with some cleverly chosen matchers, but first we 328e5dd7070Spatrickneed to decide exactly which properties we want to allow. 329e5dd7070Spatrick 330e5dd7070SpatrickHow can we characterize for loops over arrays which would be eligible 331e5dd7070Spatrickfor translation to range-based syntax? Range based loops over arrays of 332e5dd7070Spatricksize ``N`` that: 333e5dd7070Spatrick 334e5dd7070Spatrick- start at index ``0`` 335e5dd7070Spatrick- iterate consecutively 336e5dd7070Spatrick- end at index ``N-1`` 337e5dd7070Spatrick 338e5dd7070SpatrickWe already check for (1), so all we need to add is a check to the loop's 339e5dd7070Spatrickcondition to ensure that the loop's index variable is compared against 340e5dd7070Spatrick``N`` and another check to ensure that the increment step just 341e5dd7070Spatrickincrements this same variable. The matcher for (2) is straightforward: 342e5dd7070Spatrickrequire a pre- or post-increment of the same variable declared in the 343e5dd7070Spatrickinit portion. 344e5dd7070Spatrick 345e5dd7070SpatrickUnfortunately, such a matcher is impossible to write. Matchers contain 346e5dd7070Spatrickno logic for comparing two arbitrary AST nodes and determining whether 347e5dd7070Spatrickor not they are equal, so the best we can do is matching more than we 348e5dd7070Spatrickwould like to allow, and punting extra comparisons to the callback. 349e5dd7070Spatrick 350e5dd7070SpatrickIn any case, we can start building this sub-matcher. We can require that 351e5dd7070Spatrickthe increment step be a unary increment like this: 352e5dd7070Spatrick 353e5dd7070Spatrick.. code-block:: c++ 354e5dd7070Spatrick 355e5dd7070Spatrick hasIncrement(unaryOperator(hasOperatorName("++"))) 356e5dd7070Spatrick 357e5dd7070SpatrickSpecifying what is incremented introduces another quirk of Clang's AST: 358e5dd7070SpatrickUsages of variables are represented as ``DeclRefExpr``'s ("declaration 359e5dd7070Spatrickreference expressions") because they are expressions which refer to 360e5dd7070Spatrickvariable declarations. To find a ``unaryOperator`` that refers to a 361e5dd7070Spatrickspecific declaration, we can simply add a second condition to it: 362e5dd7070Spatrick 363e5dd7070Spatrick.. code-block:: c++ 364e5dd7070Spatrick 365e5dd7070Spatrick hasIncrement(unaryOperator( 366e5dd7070Spatrick hasOperatorName("++"), 367e5dd7070Spatrick hasUnaryOperand(declRefExpr()))) 368e5dd7070Spatrick 369e5dd7070SpatrickFurthermore, we can restrict our matcher to only match if the 370e5dd7070Spatrickincremented variable is an integer: 371e5dd7070Spatrick 372e5dd7070Spatrick.. code-block:: c++ 373e5dd7070Spatrick 374e5dd7070Spatrick hasIncrement(unaryOperator( 375e5dd7070Spatrick hasOperatorName("++"), 376e5dd7070Spatrick hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger()))))))) 377e5dd7070Spatrick 378e5dd7070SpatrickAnd the last step will be to attach an identifier to this variable, so 379e5dd7070Spatrickthat we can retrieve it in the callback: 380e5dd7070Spatrick 381e5dd7070Spatrick.. code-block:: c++ 382e5dd7070Spatrick 383e5dd7070Spatrick hasIncrement(unaryOperator( 384e5dd7070Spatrick hasOperatorName("++"), 385e5dd7070Spatrick hasUnaryOperand(declRefExpr(to( 386e5dd7070Spatrick varDecl(hasType(isInteger())).bind("incrementVariable")))))) 387e5dd7070Spatrick 388e5dd7070SpatrickWe can add this code to the definition of ``LoopMatcher`` and make sure 389e5dd7070Spatrickthat our program, outfitted with the new matcher, only prints out loops 390e5dd7070Spatrickthat declare a single variable initialized to zero and have an increment 391e5dd7070Spatrickstep consisting of a unary increment of some variable. 392e5dd7070Spatrick 393e5dd7070SpatrickNow, we just need to add a matcher to check if the condition part of the 394e5dd7070Spatrick``for`` loop compares a variable against the size of the array. There is 395e5dd7070Spatrickonly one problem - we don't know which array we're iterating over 396e5dd7070Spatrickwithout looking at the body of the loop! We are again restricted to 397e5dd7070Spatrickapproximating the result we want with matchers, filling in the details 398e5dd7070Spatrickin the callback. So we start with: 399e5dd7070Spatrick 400e5dd7070Spatrick.. code-block:: c++ 401e5dd7070Spatrick 402*12c85518Srobert hasCondition(binaryOperator(hasOperatorName("<"))) 403e5dd7070Spatrick 404e5dd7070SpatrickIt makes sense to ensure that the left-hand side is a reference to a 405e5dd7070Spatrickvariable, and that the right-hand side has integer type. 406e5dd7070Spatrick 407e5dd7070Spatrick.. code-block:: c++ 408e5dd7070Spatrick 409e5dd7070Spatrick hasCondition(binaryOperator( 410e5dd7070Spatrick hasOperatorName("<"), 411e5dd7070Spatrick hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))), 412e5dd7070Spatrick hasRHS(expr(hasType(isInteger()))))) 413e5dd7070Spatrick 414e5dd7070SpatrickWhy? Because it doesn't work. Of the three loops provided in 415e5dd7070Spatrick``test-files/simple.cpp``, zero of them have a matching condition. A 416e5dd7070Spatrickquick look at the AST dump of the first for loop, produced by the 417e5dd7070Spatrickprevious iteration of loop-convert, shows us the answer: 418e5dd7070Spatrick 419e5dd7070Spatrick:: 420e5dd7070Spatrick 421e5dd7070Spatrick (ForStmt 0x173b240 422e5dd7070Spatrick (DeclStmt 0x173afc8 423e5dd7070Spatrick 0x173af50 "int i = 424e5dd7070Spatrick (IntegerLiteral 0x173afa8 'int' 0)") 425e5dd7070Spatrick <<>> 426e5dd7070Spatrick (BinaryOperator 0x173b060 '_Bool' '<' 427e5dd7070Spatrick (ImplicitCastExpr 0x173b030 'int' 428e5dd7070Spatrick (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int')) 429e5dd7070Spatrick (ImplicitCastExpr 0x173b048 'int' 430e5dd7070Spatrick (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int'))) 431e5dd7070Spatrick (UnaryOperator 0x173b0b0 'int' lvalue prefix '++' 432e5dd7070Spatrick (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int')) 433e5dd7070Spatrick (CompoundStatement ... 434e5dd7070Spatrick 435e5dd7070SpatrickWe already know that the declaration and increments both match, or this 436e5dd7070Spatrickloop wouldn't have been dumped. The culprit lies in the implicit cast 437e5dd7070Spatrickapplied to the first operand (i.e. the LHS) of the less-than operator, 438e5dd7070Spatrickan L-value to R-value conversion applied to the expression referencing 439e5dd7070Spatrick``i``. Thankfully, the matcher library offers a solution to this problem 440e5dd7070Spatrickin the form of ``ignoringParenImpCasts``, which instructs the matcher to 441e5dd7070Spatrickignore implicit casts and parentheses before continuing to match. 442e5dd7070SpatrickAdjusting the condition operator will restore the desired match. 443e5dd7070Spatrick 444e5dd7070Spatrick.. code-block:: c++ 445e5dd7070Spatrick 446e5dd7070Spatrick hasCondition(binaryOperator( 447e5dd7070Spatrick hasOperatorName("<"), 448e5dd7070Spatrick hasLHS(ignoringParenImpCasts(declRefExpr( 449e5dd7070Spatrick to(varDecl(hasType(isInteger())))))), 450e5dd7070Spatrick hasRHS(expr(hasType(isInteger()))))) 451e5dd7070Spatrick 452e5dd7070SpatrickAfter adding binds to the expressions we wished to capture and 453e5dd7070Spatrickextracting the identifier strings into variables, we have array-step-2 454e5dd7070Spatrickcompleted. 455e5dd7070Spatrick 456e5dd7070SpatrickStep 4: Retrieving Matched Nodes 457e5dd7070Spatrick================================ 458e5dd7070Spatrick 459e5dd7070SpatrickSo far, the matcher callback isn't very interesting: it just dumps the 460e5dd7070Spatrickloop's AST. At some point, we will need to make changes to the input 461e5dd7070Spatricksource code. Next, we'll work on using the nodes we bound in the 462e5dd7070Spatrickprevious step. 463e5dd7070Spatrick 464e5dd7070SpatrickThe ``MatchFinder::run()`` callback takes a 465e5dd7070Spatrick``MatchFinder::MatchResult&`` as its parameter. We're most interested in 466e5dd7070Spatrickits ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext`` 467e5dd7070Spatrickclass to represent contextual information about the AST, as the name 468e5dd7070Spatrickimplies, though the most functionally important detail is that several 469e5dd7070Spatrickoperations require an ``ASTContext*`` parameter. More immediately useful 470e5dd7070Spatrickis the set of matched nodes, and how we retrieve them. 471e5dd7070Spatrick 472e5dd7070SpatrickSince we bind three variables (identified by ConditionVarName, 473e5dd7070SpatrickInitVarName, and IncrementVarName), we can obtain the matched nodes by 474e5dd7070Spatrickusing the ``getNodeAs()`` member function. 475e5dd7070Spatrick 476e5dd7070SpatrickIn ``LoopConvert.cpp`` add 477e5dd7070Spatrick 478e5dd7070Spatrick.. code-block:: c++ 479e5dd7070Spatrick 480e5dd7070Spatrick #include "clang/AST/ASTContext.h" 481e5dd7070Spatrick 482e5dd7070SpatrickChange ``LoopMatcher`` to 483e5dd7070Spatrick 484e5dd7070Spatrick.. code-block:: c++ 485e5dd7070Spatrick 486e5dd7070Spatrick StatementMatcher LoopMatcher = 487e5dd7070Spatrick forStmt(hasLoopInit(declStmt( 488e5dd7070Spatrick hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0)))) 489e5dd7070Spatrick .bind("initVarName")))), 490e5dd7070Spatrick hasIncrement(unaryOperator( 491e5dd7070Spatrick hasOperatorName("++"), 492e5dd7070Spatrick hasUnaryOperand(declRefExpr( 493e5dd7070Spatrick to(varDecl(hasType(isInteger())).bind("incVarName")))))), 494e5dd7070Spatrick hasCondition(binaryOperator( 495e5dd7070Spatrick hasOperatorName("<"), 496e5dd7070Spatrick hasLHS(ignoringParenImpCasts(declRefExpr( 497e5dd7070Spatrick to(varDecl(hasType(isInteger())).bind("condVarName"))))), 498e5dd7070Spatrick hasRHS(expr(hasType(isInteger())))))).bind("forLoop"); 499e5dd7070Spatrick 500e5dd7070SpatrickAnd change ``LoopPrinter::run`` to 501e5dd7070Spatrick 502e5dd7070Spatrick.. code-block:: c++ 503e5dd7070Spatrick 504e5dd7070Spatrick void LoopPrinter::run(const MatchFinder::MatchResult &Result) { 505e5dd7070Spatrick ASTContext *Context = Result.Context; 506e5dd7070Spatrick const ForStmt *FS = Result.Nodes.getNodeAs<ForStmt>("forLoop"); 507e5dd7070Spatrick // We do not want to convert header files! 508e5dd7070Spatrick if (!FS || !Context->getSourceManager().isWrittenInMainFile(FS->getForLoc())) 509e5dd7070Spatrick return; 510e5dd7070Spatrick const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName"); 511e5dd7070Spatrick const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName"); 512e5dd7070Spatrick const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName"); 513e5dd7070Spatrick 514e5dd7070Spatrick if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar)) 515e5dd7070Spatrick return; 516e5dd7070Spatrick llvm::outs() << "Potential array-based loop discovered.\n"; 517e5dd7070Spatrick } 518e5dd7070Spatrick 519e5dd7070SpatrickClang associates a ``VarDecl`` with each variable to represent the variable's 520e5dd7070Spatrickdeclaration. Since the "canonical" form of each declaration is unique by 521e5dd7070Spatrickaddress, all we need to do is make sure neither ``ValueDecl`` (base class of 522e5dd7070Spatrick``VarDecl``) is ``NULL`` and compare the canonical Decls. 523e5dd7070Spatrick 524e5dd7070Spatrick.. code-block:: c++ 525e5dd7070Spatrick 526e5dd7070Spatrick static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) { 527e5dd7070Spatrick return First && Second && 528e5dd7070Spatrick First->getCanonicalDecl() == Second->getCanonicalDecl(); 529e5dd7070Spatrick } 530e5dd7070Spatrick 531e5dd7070SpatrickIf execution reaches the end of ``LoopPrinter::run()``, we know that the 532*12c85518Srobertloop shell looks like 533e5dd7070Spatrick 534e5dd7070Spatrick.. code-block:: c++ 535e5dd7070Spatrick 536e5dd7070Spatrick for (int i= 0; i < expr(); ++i) { ... } 537e5dd7070Spatrick 538e5dd7070SpatrickFor now, we will just print a message explaining that we found a loop. 539e5dd7070SpatrickThe next section will deal with recursively traversing the AST to 540e5dd7070Spatrickdiscover all changes needed. 541e5dd7070Spatrick 542e5dd7070SpatrickAs a side note, it's not as trivial to test if two expressions are the same, 543e5dd7070Spatrickthough Clang has already done the hard work for us by providing a way to 544e5dd7070Spatrickcanonicalize expressions: 545e5dd7070Spatrick 546e5dd7070Spatrick.. code-block:: c++ 547e5dd7070Spatrick 548e5dd7070Spatrick static bool areSameExpr(ASTContext *Context, const Expr *First, 549e5dd7070Spatrick const Expr *Second) { 550e5dd7070Spatrick if (!First || !Second) 551e5dd7070Spatrick return false; 552e5dd7070Spatrick llvm::FoldingSetNodeID FirstID, SecondID; 553e5dd7070Spatrick First->Profile(FirstID, *Context, true); 554e5dd7070Spatrick Second->Profile(SecondID, *Context, true); 555e5dd7070Spatrick return FirstID == SecondID; 556e5dd7070Spatrick } 557e5dd7070Spatrick 558e5dd7070SpatrickThis code relies on the comparison between two 559e5dd7070Spatrick``llvm::FoldingSetNodeIDs``. As the documentation for 560e5dd7070Spatrick``Stmt::Profile()`` indicates, the ``Profile()`` member function builds 561e5dd7070Spatricka description of a node in the AST, based on its properties, along with 562e5dd7070Spatrickthose of its children. ``FoldingSetNodeID`` then serves as a hash we can 563e5dd7070Spatrickuse to compare expressions. We will need ``areSameExpr`` later. Before 564e5dd7070Spatrickyou run the new code on the additional loops added to 565e5dd7070Spatricktest-files/simple.cpp, try to figure out which ones will be considered 566e5dd7070Spatrickpotentially convertible. 567