1e5dd7070Spatrick===============================================================
2e5dd7070SpatrickTutorial for building tools using LibTooling and LibASTMatchers
3e5dd7070Spatrick===============================================================
4e5dd7070Spatrick
5e5dd7070SpatrickThis document is intended to show how to build a useful source-to-source
6e5dd7070Spatricktranslation tool based on Clang's `LibTooling <LibTooling.html>`_. It is
7e5dd7070Spatrickexplicitly aimed at people who are new to Clang, so all you should need
8e5dd7070Spatrickis a working knowledge of C++ and the command line.
9e5dd7070Spatrick
10e5dd7070SpatrickIn order to work on the compiler, you need some basic knowledge of the
11e5dd7070Spatrickabstract syntax tree (AST). To this end, the reader is encouraged to
12e5dd7070Spatrickskim the :doc:`Introduction to the Clang
13e5dd7070SpatrickAST <IntroductionToTheClangAST>`
14e5dd7070Spatrick
15e5dd7070SpatrickStep 0: Obtaining Clang
16e5dd7070Spatrick=======================
17e5dd7070Spatrick
18e5dd7070SpatrickAs Clang is part of the LLVM project, you'll need to download LLVM's
19e5dd7070Spatricksource code first. Both Clang and LLVM are in the same git repository,
20e5dd7070Spatrickunder different directories. For further information, see the `getting
21e5dd7070Spatrickstarted guide <https://llvm.org/docs/GettingStarted.html>`_.
22e5dd7070Spatrick
23e5dd7070Spatrick.. code-block:: console
24e5dd7070Spatrick
25e5dd7070Spatrick      cd ~/clang-llvm
26e5dd7070Spatrick      git clone https://github.com/llvm/llvm-project.git
27e5dd7070Spatrick
28e5dd7070SpatrickNext you need to obtain the CMake build system and Ninja build tool.
29e5dd7070Spatrick
30e5dd7070Spatrick.. code-block:: console
31e5dd7070Spatrick
32e5dd7070Spatrick      cd ~/clang-llvm
33e5dd7070Spatrick      git clone https://github.com/martine/ninja.git
34e5dd7070Spatrick      cd ninja
35e5dd7070Spatrick      git checkout release
36e5dd7070Spatrick      ./bootstrap.py
37e5dd7070Spatrick      sudo cp ninja /usr/bin/
38e5dd7070Spatrick
39e5dd7070Spatrick      cd ~/clang-llvm
40e5dd7070Spatrick      git clone git://cmake.org/stage/cmake.git
41e5dd7070Spatrick      cd cmake
42e5dd7070Spatrick      git checkout next
43e5dd7070Spatrick      ./bootstrap
44e5dd7070Spatrick      make
45e5dd7070Spatrick      sudo make install
46e5dd7070Spatrick
47e5dd7070SpatrickOkay. Now we'll build Clang!
48e5dd7070Spatrick
49e5dd7070Spatrick.. code-block:: console
50e5dd7070Spatrick
51e5dd7070Spatrick      cd ~/clang-llvm
52e5dd7070Spatrick      mkdir build && cd build
53e5dd7070Spatrick      cmake -G Ninja ../llvm -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" -DLLVM_BUILD_TESTS=ON  # Enable tests; default is off.
54e5dd7070Spatrick      ninja
55e5dd7070Spatrick      ninja check       # Test LLVM only.
56e5dd7070Spatrick      ninja clang-test  # Test Clang only.
57e5dd7070Spatrick      ninja install
58e5dd7070Spatrick
59e5dd7070SpatrickAnd we're live.
60e5dd7070Spatrick
61e5dd7070SpatrickAll of the tests should pass.
62e5dd7070Spatrick
63e5dd7070SpatrickFinally, we want to set Clang as its own compiler.
64e5dd7070Spatrick
65e5dd7070Spatrick.. code-block:: console
66e5dd7070Spatrick
67e5dd7070Spatrick      cd ~/clang-llvm/build
68e5dd7070Spatrick      ccmake ../llvm
69e5dd7070Spatrick
70e5dd7070SpatrickThe second command will bring up a GUI for configuring Clang. You need
71e5dd7070Spatrickto set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on
72e5dd7070Spatrickadvanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to
73e5dd7070Spatrick``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to
74e5dd7070Spatrickconfigure, then ``'g'`` to generate CMake's files.
75e5dd7070Spatrick
76e5dd7070SpatrickFinally, run ninja one last time, and you're done.
77e5dd7070Spatrick
78e5dd7070SpatrickStep 1: Create a ClangTool
79e5dd7070Spatrick==========================
80e5dd7070Spatrick
81e5dd7070SpatrickNow that we have enough background knowledge, it's time to create the
82e5dd7070Spatricksimplest productive ClangTool in existence: a syntax checker. While this
83e5dd7070Spatrickalready exists as ``clang-check``, it's important to understand what's
84e5dd7070Spatrickgoing on.
85e5dd7070Spatrick
86e5dd7070SpatrickFirst, we'll need to create a new directory for our tool and tell CMake
87e5dd7070Spatrickthat it exists. As this is not going to be a core clang tool, it will
88e5dd7070Spatricklive in the ``clang-tools-extra`` repository.
89e5dd7070Spatrick
90e5dd7070Spatrick.. code-block:: console
91e5dd7070Spatrick
92e5dd7070Spatrick      cd ~/clang-llvm
93e5dd7070Spatrick      mkdir clang-tools-extra/loop-convert
94e5dd7070Spatrick      echo 'add_subdirectory(loop-convert)' >> clang-tools-extra/CMakeLists.txt
95e5dd7070Spatrick      vim clang-tools-extra/loop-convert/CMakeLists.txt
96e5dd7070Spatrick
97e5dd7070SpatrickCMakeLists.txt should have the following contents:
98e5dd7070Spatrick
99e5dd7070Spatrick::
100e5dd7070Spatrick
101e5dd7070Spatrick      set(LLVM_LINK_COMPONENTS support)
102e5dd7070Spatrick
103e5dd7070Spatrick      add_clang_executable(loop-convert
104e5dd7070Spatrick        LoopConvert.cpp
105e5dd7070Spatrick        )
106e5dd7070Spatrick      target_link_libraries(loop-convert
107e5dd7070Spatrick        PRIVATE
108a9ac8606Spatrick        clangAST
109e5dd7070Spatrick        clangASTMatchers
110a9ac8606Spatrick        clangBasic
111a9ac8606Spatrick        clangFrontend
112a9ac8606Spatrick        clangSerialization
113a9ac8606Spatrick        clangTooling
114e5dd7070Spatrick        )
115e5dd7070Spatrick
116e5dd7070SpatrickWith that done, Ninja will be able to compile our tool. Let's give it
117e5dd7070Spatricksomething to compile! Put the following into
118e5dd7070Spatrick``clang-tools-extra/loop-convert/LoopConvert.cpp``. A detailed explanation of
119e5dd7070Spatrickwhy the different parts are needed can be found in the `LibTooling
120e5dd7070Spatrickdocumentation <LibTooling.html>`_.
121e5dd7070Spatrick
122e5dd7070Spatrick.. code-block:: c++
123e5dd7070Spatrick
124e5dd7070Spatrick      // Declares clang::SyntaxOnlyAction.
125e5dd7070Spatrick      #include "clang/Frontend/FrontendActions.h"
126e5dd7070Spatrick      #include "clang/Tooling/CommonOptionsParser.h"
127e5dd7070Spatrick      #include "clang/Tooling/Tooling.h"
128e5dd7070Spatrick      // Declares llvm::cl::extrahelp.
129e5dd7070Spatrick      #include "llvm/Support/CommandLine.h"
130e5dd7070Spatrick
131e5dd7070Spatrick      using namespace clang::tooling;
132e5dd7070Spatrick      using namespace llvm;
133e5dd7070Spatrick
134e5dd7070Spatrick      // Apply a custom category to all command-line options so that they are the
135e5dd7070Spatrick      // only ones displayed.
136e5dd7070Spatrick      static llvm::cl::OptionCategory MyToolCategory("my-tool options");
137e5dd7070Spatrick
138e5dd7070Spatrick      // CommonOptionsParser declares HelpMessage with a description of the common
139e5dd7070Spatrick      // command-line options related to the compilation database and input files.
140e5dd7070Spatrick      // It's nice to have this help message in all tools.
141e5dd7070Spatrick      static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
142e5dd7070Spatrick
143e5dd7070Spatrick      // A help message for this specific tool can be added afterwards.
144e5dd7070Spatrick      static cl::extrahelp MoreHelp("\nMore help text...\n");
145e5dd7070Spatrick
146e5dd7070Spatrick      int main(int argc, const char **argv) {
147a9ac8606Spatrick        auto ExpectedParser = CommonOptionsParser::create(argc, argv, MyToolCategory);
148a9ac8606Spatrick        if (!ExpectedParser) {
149a9ac8606Spatrick          // Fail gracefully for unsupported options.
150a9ac8606Spatrick          llvm::errs() << ExpectedParser.takeError();
151a9ac8606Spatrick          return 1;
152a9ac8606Spatrick        }
153a9ac8606Spatrick        CommonOptionsParser& OptionsParser = ExpectedParser.get();
154e5dd7070Spatrick        ClangTool Tool(OptionsParser.getCompilations(),
155e5dd7070Spatrick                       OptionsParser.getSourcePathList());
156e5dd7070Spatrick        return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>().get());
157e5dd7070Spatrick      }
158e5dd7070Spatrick
159e5dd7070SpatrickAnd that's it! You can compile our new tool by running ninja from the
160e5dd7070Spatrick``build`` directory.
161e5dd7070Spatrick
162e5dd7070Spatrick.. code-block:: console
163e5dd7070Spatrick
164e5dd7070Spatrick      cd ~/clang-llvm/build
165e5dd7070Spatrick      ninja
166e5dd7070Spatrick
167e5dd7070SpatrickYou should now be able to run the syntax checker, which is located in
168e5dd7070Spatrick``~/clang-llvm/build/bin``, on any source file. Try it!
169e5dd7070Spatrick
170e5dd7070Spatrick.. code-block:: console
171e5dd7070Spatrick
172e5dd7070Spatrick      echo "int main() { return 0; }" > test.cpp
173e5dd7070Spatrick      bin/loop-convert test.cpp --
174e5dd7070Spatrick
175e5dd7070SpatrickNote the two dashes after we specify the source file. The additional
176e5dd7070Spatrickoptions for the compiler are passed after the dashes rather than loading
177e5dd7070Spatrickthem from a compilation database - there just aren't any options needed
178e5dd7070Spatrickright now.
179e5dd7070Spatrick
180e5dd7070SpatrickIntermezzo: Learn AST matcher basics
181e5dd7070Spatrick====================================
182e5dd7070Spatrick
183e5dd7070SpatrickClang recently introduced the :doc:`ASTMatcher
184e5dd7070Spatricklibrary <LibASTMatchers>` to provide a simple, powerful, and
185e5dd7070Spatrickconcise way to describe specific patterns in the AST. Implemented as a
186e5dd7070SpatrickDSL powered by macros and templates (see
187e5dd7070Spatrick`ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're
188e5dd7070Spatrickcurious), matchers offer the feel of algebraic data types common to
189e5dd7070Spatrickfunctional programming languages.
190e5dd7070Spatrick
191e5dd7070SpatrickFor example, suppose you wanted to examine only binary operators. There
192e5dd7070Spatrickis a matcher to do exactly that, conveniently named ``binaryOperator``.
193e5dd7070SpatrickI'll give you one guess what this matcher does:
194e5dd7070Spatrick
195e5dd7070Spatrick.. code-block:: c++
196e5dd7070Spatrick
197e5dd7070Spatrick      binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0))))
198e5dd7070Spatrick
199e5dd7070SpatrickShockingly, it will match against addition expressions whose left hand
200e5dd7070Spatrickside is exactly the literal 0. It will not match against other forms of
201e5dd7070Spatrick0, such as ``'\0'`` or ``NULL``, but it will match against macros that
202e5dd7070Spatrickexpand to 0. The matcher will also not match against calls to the
203e5dd7070Spatrickoverloaded operator ``'+'``, as there is a separate ``operatorCallExpr``
204e5dd7070Spatrickmatcher to handle overloaded operators.
205e5dd7070Spatrick
206e5dd7070SpatrickThere are AST matchers to match all the different nodes of the AST,
207e5dd7070Spatricknarrowing matchers to only match AST nodes fulfilling specific criteria,
208e5dd7070Spatrickand traversal matchers to get from one kind of AST node to another. For
209e5dd7070Spatricka complete list of AST matchers, take a look at the `AST Matcher
210e5dd7070SpatrickReferences <LibASTMatchersReference.html>`_
211e5dd7070Spatrick
212e5dd7070SpatrickAll matcher that are nouns describe entities in the AST and can be
213e5dd7070Spatrickbound, so that they can be referred to whenever a match is found. To do
214e5dd7070Spatrickso, simply call the method ``bind`` on these matchers, e.g.:
215e5dd7070Spatrick
216e5dd7070Spatrick.. code-block:: c++
217e5dd7070Spatrick
218e5dd7070Spatrick      variable(hasType(isInteger())).bind("intvar")
219e5dd7070Spatrick
220e5dd7070SpatrickStep 2: Using AST matchers
221e5dd7070Spatrick==========================
222e5dd7070Spatrick
223e5dd7070SpatrickOkay, on to using matchers for real. Let's start by defining a matcher
224e5dd7070Spatrickwhich will capture all ``for`` statements that define a new variable
225e5dd7070Spatrickinitialized to zero. Let's start with matching all ``for`` loops:
226e5dd7070Spatrick
227e5dd7070Spatrick.. code-block:: c++
228e5dd7070Spatrick
229e5dd7070Spatrick      forStmt()
230e5dd7070Spatrick
231e5dd7070SpatrickNext, we want to specify that a single variable is declared in the first
232e5dd7070Spatrickportion of the loop, so we can extend the matcher to
233e5dd7070Spatrick
234e5dd7070Spatrick.. code-block:: c++
235e5dd7070Spatrick
236e5dd7070Spatrick      forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl()))))
237e5dd7070Spatrick
238e5dd7070SpatrickFinally, we can add the condition that the variable is initialized to
239e5dd7070Spatrickzero.
240e5dd7070Spatrick
241e5dd7070Spatrick.. code-block:: c++
242e5dd7070Spatrick
243e5dd7070Spatrick      forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
244e5dd7070Spatrick        hasInitializer(integerLiteral(equals(0))))))))
245e5dd7070Spatrick
246e5dd7070SpatrickIt is fairly easy to read and understand the matcher definition ("match
247e5dd7070Spatrickloops whose init portion declares a single variable which is initialized
248e5dd7070Spatrickto the integer literal 0"), but deciding that every piece is necessary
249e5dd7070Spatrickis more difficult. Note that this matcher will not match loops whose
250e5dd7070Spatrickvariables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of
251e5dd7070Spatrickzero besides the integer 0.
252e5dd7070Spatrick
253e5dd7070SpatrickThe last step is giving the matcher a name and binding the ``ForStmt``
254e5dd7070Spatrickas we will want to do something with it:
255e5dd7070Spatrick
256e5dd7070Spatrick.. code-block:: c++
257e5dd7070Spatrick
258e5dd7070Spatrick      StatementMatcher LoopMatcher =
259e5dd7070Spatrick        forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
260e5dd7070Spatrick          hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
261e5dd7070Spatrick
262e5dd7070SpatrickOnce you have defined your matchers, you will need to add a little more
263e5dd7070Spatrickscaffolding in order to run them. Matchers are paired with a
264e5dd7070Spatrick``MatchCallback`` and registered with a ``MatchFinder`` object, then run
265e5dd7070Spatrickfrom a ``ClangTool``. More code!
266e5dd7070Spatrick
267e5dd7070SpatrickAdd the following to ``LoopConvert.cpp``:
268e5dd7070Spatrick
269e5dd7070Spatrick.. code-block:: c++
270e5dd7070Spatrick
271e5dd7070Spatrick      #include "clang/ASTMatchers/ASTMatchers.h"
272e5dd7070Spatrick      #include "clang/ASTMatchers/ASTMatchFinder.h"
273e5dd7070Spatrick
274e5dd7070Spatrick      using namespace clang;
275e5dd7070Spatrick      using namespace clang::ast_matchers;
276e5dd7070Spatrick
277e5dd7070Spatrick      StatementMatcher LoopMatcher =
278e5dd7070Spatrick        forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
279e5dd7070Spatrick          hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
280e5dd7070Spatrick
281e5dd7070Spatrick      class LoopPrinter : public MatchFinder::MatchCallback {
282e5dd7070Spatrick      public :
283e5dd7070Spatrick        virtual void run(const MatchFinder::MatchResult &Result) {
284e5dd7070Spatrick          if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop"))
285e5dd7070Spatrick            FS->dump();
286e5dd7070Spatrick        }
287e5dd7070Spatrick      };
288e5dd7070Spatrick
289e5dd7070SpatrickAnd change ``main()`` to:
290e5dd7070Spatrick
291e5dd7070Spatrick.. code-block:: c++
292e5dd7070Spatrick
293e5dd7070Spatrick      int main(int argc, const char **argv) {
294a9ac8606Spatrick        auto ExpectedParser = CommonOptionsParser::create(argc, argv, MyToolCategory);
295a9ac8606Spatrick        if (!ExpectedParser) {
296a9ac8606Spatrick          // Fail gracefully for unsupported options.
297a9ac8606Spatrick          llvm::errs() << ExpectedParser.takeError();
298a9ac8606Spatrick          return 1;
299a9ac8606Spatrick        }
300a9ac8606Spatrick        CommonOptionsParser& OptionsParser = ExpectedParser.get();
301e5dd7070Spatrick        ClangTool Tool(OptionsParser.getCompilations(),
302e5dd7070Spatrick                       OptionsParser.getSourcePathList());
303e5dd7070Spatrick
304e5dd7070Spatrick        LoopPrinter Printer;
305e5dd7070Spatrick        MatchFinder Finder;
306e5dd7070Spatrick        Finder.addMatcher(LoopMatcher, &Printer);
307e5dd7070Spatrick
308e5dd7070Spatrick        return Tool.run(newFrontendActionFactory(&Finder).get());
309e5dd7070Spatrick      }
310e5dd7070Spatrick
311e5dd7070SpatrickNow, you should be able to recompile and run the code to discover for
312e5dd7070Spatrickloops. Create a new file with a few examples, and test out our new
313e5dd7070Spatrickhandiwork:
314e5dd7070Spatrick
315e5dd7070Spatrick.. code-block:: console
316e5dd7070Spatrick
317e5dd7070Spatrick      cd ~/clang-llvm/llvm/llvm_build/
318e5dd7070Spatrick      ninja loop-convert
319e5dd7070Spatrick      vim ~/test-files/simple-loops.cc
320e5dd7070Spatrick      bin/loop-convert ~/test-files/simple-loops.cc
321e5dd7070Spatrick
322e5dd7070SpatrickStep 3.5: More Complicated Matchers
323e5dd7070Spatrick===================================
324e5dd7070Spatrick
325e5dd7070SpatrickOur simple matcher is capable of discovering for loops, but we would
326e5dd7070Spatrickstill need to filter out many more ourselves. We can do a good portion
327e5dd7070Spatrickof the remaining work with some cleverly chosen matchers, but first we
328e5dd7070Spatrickneed to decide exactly which properties we want to allow.
329e5dd7070Spatrick
330e5dd7070SpatrickHow can we characterize for loops over arrays which would be eligible
331e5dd7070Spatrickfor translation to range-based syntax? Range based loops over arrays of
332e5dd7070Spatricksize ``N`` that:
333e5dd7070Spatrick
334e5dd7070Spatrick-  start at index ``0``
335e5dd7070Spatrick-  iterate consecutively
336e5dd7070Spatrick-  end at index ``N-1``
337e5dd7070Spatrick
338e5dd7070SpatrickWe already check for (1), so all we need to add is a check to the loop's
339e5dd7070Spatrickcondition to ensure that the loop's index variable is compared against
340e5dd7070Spatrick``N`` and another check to ensure that the increment step just
341e5dd7070Spatrickincrements this same variable. The matcher for (2) is straightforward:
342e5dd7070Spatrickrequire a pre- or post-increment of the same variable declared in the
343e5dd7070Spatrickinit portion.
344e5dd7070Spatrick
345e5dd7070SpatrickUnfortunately, such a matcher is impossible to write. Matchers contain
346e5dd7070Spatrickno logic for comparing two arbitrary AST nodes and determining whether
347e5dd7070Spatrickor not they are equal, so the best we can do is matching more than we
348e5dd7070Spatrickwould like to allow, and punting extra comparisons to the callback.
349e5dd7070Spatrick
350e5dd7070SpatrickIn any case, we can start building this sub-matcher. We can require that
351e5dd7070Spatrickthe increment step be a unary increment like this:
352e5dd7070Spatrick
353e5dd7070Spatrick.. code-block:: c++
354e5dd7070Spatrick
355e5dd7070Spatrick      hasIncrement(unaryOperator(hasOperatorName("++")))
356e5dd7070Spatrick
357e5dd7070SpatrickSpecifying what is incremented introduces another quirk of Clang's AST:
358e5dd7070SpatrickUsages of variables are represented as ``DeclRefExpr``'s ("declaration
359e5dd7070Spatrickreference expressions") because they are expressions which refer to
360e5dd7070Spatrickvariable declarations. To find a ``unaryOperator`` that refers to a
361e5dd7070Spatrickspecific declaration, we can simply add a second condition to it:
362e5dd7070Spatrick
363e5dd7070Spatrick.. code-block:: c++
364e5dd7070Spatrick
365e5dd7070Spatrick      hasIncrement(unaryOperator(
366e5dd7070Spatrick        hasOperatorName("++"),
367e5dd7070Spatrick        hasUnaryOperand(declRefExpr())))
368e5dd7070Spatrick
369e5dd7070SpatrickFurthermore, we can restrict our matcher to only match if the
370e5dd7070Spatrickincremented variable is an integer:
371e5dd7070Spatrick
372e5dd7070Spatrick.. code-block:: c++
373e5dd7070Spatrick
374e5dd7070Spatrick      hasIncrement(unaryOperator(
375e5dd7070Spatrick        hasOperatorName("++"),
376e5dd7070Spatrick        hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger())))))))
377e5dd7070Spatrick
378e5dd7070SpatrickAnd the last step will be to attach an identifier to this variable, so
379e5dd7070Spatrickthat we can retrieve it in the callback:
380e5dd7070Spatrick
381e5dd7070Spatrick.. code-block:: c++
382e5dd7070Spatrick
383e5dd7070Spatrick      hasIncrement(unaryOperator(
384e5dd7070Spatrick        hasOperatorName("++"),
385e5dd7070Spatrick        hasUnaryOperand(declRefExpr(to(
386e5dd7070Spatrick          varDecl(hasType(isInteger())).bind("incrementVariable"))))))
387e5dd7070Spatrick
388e5dd7070SpatrickWe can add this code to the definition of ``LoopMatcher`` and make sure
389e5dd7070Spatrickthat our program, outfitted with the new matcher, only prints out loops
390e5dd7070Spatrickthat declare a single variable initialized to zero and have an increment
391e5dd7070Spatrickstep consisting of a unary increment of some variable.
392e5dd7070Spatrick
393e5dd7070SpatrickNow, we just need to add a matcher to check if the condition part of the
394e5dd7070Spatrick``for`` loop compares a variable against the size of the array. There is
395e5dd7070Spatrickonly one problem - we don't know which array we're iterating over
396e5dd7070Spatrickwithout looking at the body of the loop! We are again restricted to
397e5dd7070Spatrickapproximating the result we want with matchers, filling in the details
398e5dd7070Spatrickin the callback. So we start with:
399e5dd7070Spatrick
400e5dd7070Spatrick.. code-block:: c++
401e5dd7070Spatrick
402*12c85518Srobert      hasCondition(binaryOperator(hasOperatorName("<")))
403e5dd7070Spatrick
404e5dd7070SpatrickIt makes sense to ensure that the left-hand side is a reference to a
405e5dd7070Spatrickvariable, and that the right-hand side has integer type.
406e5dd7070Spatrick
407e5dd7070Spatrick.. code-block:: c++
408e5dd7070Spatrick
409e5dd7070Spatrick      hasCondition(binaryOperator(
410e5dd7070Spatrick        hasOperatorName("<"),
411e5dd7070Spatrick        hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))),
412e5dd7070Spatrick        hasRHS(expr(hasType(isInteger())))))
413e5dd7070Spatrick
414e5dd7070SpatrickWhy? Because it doesn't work. Of the three loops provided in
415e5dd7070Spatrick``test-files/simple.cpp``, zero of them have a matching condition. A
416e5dd7070Spatrickquick look at the AST dump of the first for loop, produced by the
417e5dd7070Spatrickprevious iteration of loop-convert, shows us the answer:
418e5dd7070Spatrick
419e5dd7070Spatrick::
420e5dd7070Spatrick
421e5dd7070Spatrick      (ForStmt 0x173b240
422e5dd7070Spatrick        (DeclStmt 0x173afc8
423e5dd7070Spatrick          0x173af50 "int i =
424e5dd7070Spatrick            (IntegerLiteral 0x173afa8 'int' 0)")
425e5dd7070Spatrick        <<>>
426e5dd7070Spatrick        (BinaryOperator 0x173b060 '_Bool' '<'
427e5dd7070Spatrick          (ImplicitCastExpr 0x173b030 'int'
428e5dd7070Spatrick            (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int'))
429e5dd7070Spatrick          (ImplicitCastExpr 0x173b048 'int'
430e5dd7070Spatrick            (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int')))
431e5dd7070Spatrick        (UnaryOperator 0x173b0b0 'int' lvalue prefix '++'
432e5dd7070Spatrick          (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int'))
433e5dd7070Spatrick        (CompoundStatement ...
434e5dd7070Spatrick
435e5dd7070SpatrickWe already know that the declaration and increments both match, or this
436e5dd7070Spatrickloop wouldn't have been dumped. The culprit lies in the implicit cast
437e5dd7070Spatrickapplied to the first operand (i.e. the LHS) of the less-than operator,
438e5dd7070Spatrickan L-value to R-value conversion applied to the expression referencing
439e5dd7070Spatrick``i``. Thankfully, the matcher library offers a solution to this problem
440e5dd7070Spatrickin the form of ``ignoringParenImpCasts``, which instructs the matcher to
441e5dd7070Spatrickignore implicit casts and parentheses before continuing to match.
442e5dd7070SpatrickAdjusting the condition operator will restore the desired match.
443e5dd7070Spatrick
444e5dd7070Spatrick.. code-block:: c++
445e5dd7070Spatrick
446e5dd7070Spatrick      hasCondition(binaryOperator(
447e5dd7070Spatrick        hasOperatorName("<"),
448e5dd7070Spatrick        hasLHS(ignoringParenImpCasts(declRefExpr(
449e5dd7070Spatrick          to(varDecl(hasType(isInteger())))))),
450e5dd7070Spatrick        hasRHS(expr(hasType(isInteger())))))
451e5dd7070Spatrick
452e5dd7070SpatrickAfter adding binds to the expressions we wished to capture and
453e5dd7070Spatrickextracting the identifier strings into variables, we have array-step-2
454e5dd7070Spatrickcompleted.
455e5dd7070Spatrick
456e5dd7070SpatrickStep 4: Retrieving Matched Nodes
457e5dd7070Spatrick================================
458e5dd7070Spatrick
459e5dd7070SpatrickSo far, the matcher callback isn't very interesting: it just dumps the
460e5dd7070Spatrickloop's AST. At some point, we will need to make changes to the input
461e5dd7070Spatricksource code. Next, we'll work on using the nodes we bound in the
462e5dd7070Spatrickprevious step.
463e5dd7070Spatrick
464e5dd7070SpatrickThe ``MatchFinder::run()`` callback takes a
465e5dd7070Spatrick``MatchFinder::MatchResult&`` as its parameter. We're most interested in
466e5dd7070Spatrickits ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext``
467e5dd7070Spatrickclass to represent contextual information about the AST, as the name
468e5dd7070Spatrickimplies, though the most functionally important detail is that several
469e5dd7070Spatrickoperations require an ``ASTContext*`` parameter. More immediately useful
470e5dd7070Spatrickis the set of matched nodes, and how we retrieve them.
471e5dd7070Spatrick
472e5dd7070SpatrickSince we bind three variables (identified by ConditionVarName,
473e5dd7070SpatrickInitVarName, and IncrementVarName), we can obtain the matched nodes by
474e5dd7070Spatrickusing the ``getNodeAs()`` member function.
475e5dd7070Spatrick
476e5dd7070SpatrickIn ``LoopConvert.cpp`` add
477e5dd7070Spatrick
478e5dd7070Spatrick.. code-block:: c++
479e5dd7070Spatrick
480e5dd7070Spatrick      #include "clang/AST/ASTContext.h"
481e5dd7070Spatrick
482e5dd7070SpatrickChange ``LoopMatcher`` to
483e5dd7070Spatrick
484e5dd7070Spatrick.. code-block:: c++
485e5dd7070Spatrick
486e5dd7070Spatrick      StatementMatcher LoopMatcher =
487e5dd7070Spatrick          forStmt(hasLoopInit(declStmt(
488e5dd7070Spatrick                      hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0))))
489e5dd7070Spatrick                                        .bind("initVarName")))),
490e5dd7070Spatrick                  hasIncrement(unaryOperator(
491e5dd7070Spatrick                      hasOperatorName("++"),
492e5dd7070Spatrick                      hasUnaryOperand(declRefExpr(
493e5dd7070Spatrick                          to(varDecl(hasType(isInteger())).bind("incVarName")))))),
494e5dd7070Spatrick                  hasCondition(binaryOperator(
495e5dd7070Spatrick                      hasOperatorName("<"),
496e5dd7070Spatrick                      hasLHS(ignoringParenImpCasts(declRefExpr(
497e5dd7070Spatrick                          to(varDecl(hasType(isInteger())).bind("condVarName"))))),
498e5dd7070Spatrick                      hasRHS(expr(hasType(isInteger())))))).bind("forLoop");
499e5dd7070Spatrick
500e5dd7070SpatrickAnd change ``LoopPrinter::run`` to
501e5dd7070Spatrick
502e5dd7070Spatrick.. code-block:: c++
503e5dd7070Spatrick
504e5dd7070Spatrick      void LoopPrinter::run(const MatchFinder::MatchResult &Result) {
505e5dd7070Spatrick        ASTContext *Context = Result.Context;
506e5dd7070Spatrick        const ForStmt *FS = Result.Nodes.getNodeAs<ForStmt>("forLoop");
507e5dd7070Spatrick        // We do not want to convert header files!
508e5dd7070Spatrick        if (!FS || !Context->getSourceManager().isWrittenInMainFile(FS->getForLoc()))
509e5dd7070Spatrick          return;
510e5dd7070Spatrick        const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName");
511e5dd7070Spatrick        const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName");
512e5dd7070Spatrick        const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName");
513e5dd7070Spatrick
514e5dd7070Spatrick        if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar))
515e5dd7070Spatrick          return;
516e5dd7070Spatrick        llvm::outs() << "Potential array-based loop discovered.\n";
517e5dd7070Spatrick      }
518e5dd7070Spatrick
519e5dd7070SpatrickClang associates a ``VarDecl`` with each variable to represent the variable's
520e5dd7070Spatrickdeclaration. Since the "canonical" form of each declaration is unique by
521e5dd7070Spatrickaddress, all we need to do is make sure neither ``ValueDecl`` (base class of
522e5dd7070Spatrick``VarDecl``) is ``NULL`` and compare the canonical Decls.
523e5dd7070Spatrick
524e5dd7070Spatrick.. code-block:: c++
525e5dd7070Spatrick
526e5dd7070Spatrick      static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) {
527e5dd7070Spatrick        return First && Second &&
528e5dd7070Spatrick               First->getCanonicalDecl() == Second->getCanonicalDecl();
529e5dd7070Spatrick      }
530e5dd7070Spatrick
531e5dd7070SpatrickIf execution reaches the end of ``LoopPrinter::run()``, we know that the
532*12c85518Srobertloop shell looks like
533e5dd7070Spatrick
534e5dd7070Spatrick.. code-block:: c++
535e5dd7070Spatrick
536e5dd7070Spatrick      for (int i= 0; i < expr(); ++i) { ... }
537e5dd7070Spatrick
538e5dd7070SpatrickFor now, we will just print a message explaining that we found a loop.
539e5dd7070SpatrickThe next section will deal with recursively traversing the AST to
540e5dd7070Spatrickdiscover all changes needed.
541e5dd7070Spatrick
542e5dd7070SpatrickAs a side note, it's not as trivial to test if two expressions are the same,
543e5dd7070Spatrickthough Clang has already done the hard work for us by providing a way to
544e5dd7070Spatrickcanonicalize expressions:
545e5dd7070Spatrick
546e5dd7070Spatrick.. code-block:: c++
547e5dd7070Spatrick
548e5dd7070Spatrick      static bool areSameExpr(ASTContext *Context, const Expr *First,
549e5dd7070Spatrick                              const Expr *Second) {
550e5dd7070Spatrick        if (!First || !Second)
551e5dd7070Spatrick          return false;
552e5dd7070Spatrick        llvm::FoldingSetNodeID FirstID, SecondID;
553e5dd7070Spatrick        First->Profile(FirstID, *Context, true);
554e5dd7070Spatrick        Second->Profile(SecondID, *Context, true);
555e5dd7070Spatrick        return FirstID == SecondID;
556e5dd7070Spatrick      }
557e5dd7070Spatrick
558e5dd7070SpatrickThis code relies on the comparison between two
559e5dd7070Spatrick``llvm::FoldingSetNodeIDs``. As the documentation for
560e5dd7070Spatrick``Stmt::Profile()`` indicates, the ``Profile()`` member function builds
561e5dd7070Spatricka description of a node in the AST, based on its properties, along with
562e5dd7070Spatrickthose of its children. ``FoldingSetNodeID`` then serves as a hash we can
563e5dd7070Spatrickuse to compare expressions. We will need ``areSameExpr`` later. Before
564e5dd7070Spatrickyou run the new code on the additional loops added to
565e5dd7070Spatricktest-files/simple.cpp, try to figure out which ones will be considered
566e5dd7070Spatrickpotentially convertible.
567