106f32e7eSjoerg======================== 206f32e7eSjoergLLVM Programmer's Manual 306f32e7eSjoerg======================== 406f32e7eSjoerg 506f32e7eSjoerg.. contents:: 606f32e7eSjoerg :local: 706f32e7eSjoerg 806f32e7eSjoerg.. warning:: 906f32e7eSjoerg This is always a work in progress. 1006f32e7eSjoerg 1106f32e7eSjoerg.. _introduction: 1206f32e7eSjoerg 1306f32e7eSjoergIntroduction 1406f32e7eSjoerg============ 1506f32e7eSjoerg 1606f32e7eSjoergThis document is meant to highlight some of the important classes and interfaces 1706f32e7eSjoergavailable in the LLVM source-base. This manual is not intended to explain what 1806f32e7eSjoergLLVM is, how it works, and what LLVM code looks like. It assumes that you know 1906f32e7eSjoergthe basics of LLVM and are interested in writing transformations or otherwise 2006f32e7eSjoerganalyzing or manipulating the code. 2106f32e7eSjoerg 2206f32e7eSjoergThis document should get you oriented so that you can find your way in the 2306f32e7eSjoergcontinuously growing source code that makes up the LLVM infrastructure. Note 2406f32e7eSjoergthat this manual is not intended to serve as a replacement for reading the 2506f32e7eSjoergsource code, so if you think there should be a method in one of these classes to 2606f32e7eSjoergdo something, but it's not listed, check the source. Links to the `doxygen 27*da58b97aSjoerg<https://llvm.org/doxygen/>`__ sources are provided to make this as easy as 2806f32e7eSjoergpossible. 2906f32e7eSjoerg 3006f32e7eSjoergThe first section of this document describes general information that is useful 3106f32e7eSjoergto know when working in the LLVM infrastructure, and the second describes the 3206f32e7eSjoergCore LLVM classes. In the future this manual will be extended with information 3306f32e7eSjoergdescribing how to use extension libraries, such as dominator information, CFG 3406f32e7eSjoergtraversal routines, and useful utilities like the ``InstVisitor`` (`doxygen 35*da58b97aSjoerg<https://llvm.org/doxygen/InstVisitor_8h_source.html>`__) template. 3606f32e7eSjoerg 3706f32e7eSjoerg.. _general: 3806f32e7eSjoerg 3906f32e7eSjoergGeneral Information 4006f32e7eSjoerg=================== 4106f32e7eSjoerg 4206f32e7eSjoergThis section contains general information that is useful if you are working in 4306f32e7eSjoergthe LLVM source-base, but that isn't specific to any particular API. 4406f32e7eSjoerg 4506f32e7eSjoerg.. _stl: 4606f32e7eSjoerg 4706f32e7eSjoergThe C++ Standard Template Library 4806f32e7eSjoerg--------------------------------- 4906f32e7eSjoerg 5006f32e7eSjoergLLVM makes heavy use of the C++ Standard Template Library (STL), perhaps much 5106f32e7eSjoergmore than you are used to, or have seen before. Because of this, you might want 5206f32e7eSjoergto do a little background reading in the techniques used and capabilities of the 5306f32e7eSjoerglibrary. There are many good pages that discuss the STL, and several books on 5406f32e7eSjoergthe subject that you can get, so it will not be discussed in this document. 5506f32e7eSjoerg 5606f32e7eSjoergHere are some useful links: 5706f32e7eSjoerg 5806f32e7eSjoerg#. `cppreference.com 5906f32e7eSjoerg <http://en.cppreference.com/w/>`_ - an excellent 6006f32e7eSjoerg reference for the STL and other parts of the standard C++ library. 6106f32e7eSjoerg 6206f32e7eSjoerg#. `C++ In a Nutshell <http://www.tempest-sw.com/cpp/>`_ - This is an O'Reilly 6306f32e7eSjoerg book in the making. It has a decent Standard Library Reference that rivals 6406f32e7eSjoerg Dinkumware's, and is unfortunately no longer free since the book has been 6506f32e7eSjoerg published. 6606f32e7eSjoerg 6706f32e7eSjoerg#. `C++ Frequently Asked Questions <http://www.parashift.com/c++-faq-lite/>`_. 6806f32e7eSjoerg 6906f32e7eSjoerg#. `SGI's STL Programmer's Guide <http://www.sgi.com/tech/stl/>`_ - Contains a 7006f32e7eSjoerg useful `Introduction to the STL 7106f32e7eSjoerg <http://www.sgi.com/tech/stl/stl_introduction.html>`_. 7206f32e7eSjoerg 7306f32e7eSjoerg#. `Bjarne Stroustrup's C++ Page 74*da58b97aSjoerg <http://www.stroustrup.com/C++.html>`_. 7506f32e7eSjoerg 7606f32e7eSjoerg#. `Bruce Eckel's Thinking in C++, 2nd ed. Volume 2 Revision 4.0 7706f32e7eSjoerg (even better, get the book) 7806f32e7eSjoerg <http://www.mindview.net/Books/TICPP/ThinkingInCPP2e.html>`_. 7906f32e7eSjoerg 8006f32e7eSjoergYou are also encouraged to take a look at the :doc:`LLVM Coding Standards 8106f32e7eSjoerg<CodingStandards>` guide which focuses on how to write maintainable code more 8206f32e7eSjoergthan where to put your curly braces. 8306f32e7eSjoerg 8406f32e7eSjoerg.. _resources: 8506f32e7eSjoerg 8606f32e7eSjoergOther useful references 8706f32e7eSjoerg----------------------- 8806f32e7eSjoerg 8906f32e7eSjoerg#. `Using static and shared libraries across platforms 9006f32e7eSjoerg <http://www.fortran-2000.com/ArnaudRecipes/sharedlib.html>`_ 9106f32e7eSjoerg 9206f32e7eSjoerg.. _apis: 9306f32e7eSjoerg 9406f32e7eSjoergImportant and useful LLVM APIs 9506f32e7eSjoerg============================== 9606f32e7eSjoerg 9706f32e7eSjoergHere we highlight some LLVM APIs that are generally useful and good to know 9806f32e7eSjoergabout when writing transformations. 9906f32e7eSjoerg 10006f32e7eSjoerg.. _isa: 10106f32e7eSjoerg 10206f32e7eSjoergThe ``isa<>``, ``cast<>`` and ``dyn_cast<>`` templates 10306f32e7eSjoerg------------------------------------------------------ 10406f32e7eSjoerg 10506f32e7eSjoergThe LLVM source-base makes extensive use of a custom form of RTTI. These 10606f32e7eSjoergtemplates have many similarities to the C++ ``dynamic_cast<>`` operator, but 10706f32e7eSjoergthey don't have some drawbacks (primarily stemming from the fact that 10806f32e7eSjoerg``dynamic_cast<>`` only works on classes that have a v-table). Because they are 10906f32e7eSjoergused so often, you must know what they do and how they work. All of these 11006f32e7eSjoergtemplates are defined in the ``llvm/Support/Casting.h`` (`doxygen 111*da58b97aSjoerg<https://llvm.org/doxygen/Casting_8h_source.html>`__) file (note that you very 11206f32e7eSjoergrarely have to include this file directly). 11306f32e7eSjoerg 11406f32e7eSjoerg``isa<>``: 11506f32e7eSjoerg The ``isa<>`` operator works exactly like the Java "``instanceof``" operator. 11606f32e7eSjoerg It returns true or false depending on whether a reference or pointer points to 11706f32e7eSjoerg an instance of the specified class. This can be very useful for constraint 11806f32e7eSjoerg checking of various sorts (example below). 11906f32e7eSjoerg 12006f32e7eSjoerg``cast<>``: 12106f32e7eSjoerg The ``cast<>`` operator is a "checked cast" operation. It converts a pointer 12206f32e7eSjoerg or reference from a base class to a derived class, causing an assertion 12306f32e7eSjoerg failure if it is not really an instance of the right type. This should be 12406f32e7eSjoerg used in cases where you have some information that makes you believe that 12506f32e7eSjoerg something is of the right type. An example of the ``isa<>`` and ``cast<>`` 12606f32e7eSjoerg template is: 12706f32e7eSjoerg 12806f32e7eSjoerg .. code-block:: c++ 12906f32e7eSjoerg 13006f32e7eSjoerg static bool isLoopInvariant(const Value *V, const Loop *L) { 13106f32e7eSjoerg if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue>(V)) 13206f32e7eSjoerg return true; 13306f32e7eSjoerg 13406f32e7eSjoerg // Otherwise, it must be an instruction... 13506f32e7eSjoerg return !L->contains(cast<Instruction>(V)->getParent()); 13606f32e7eSjoerg } 13706f32e7eSjoerg 13806f32e7eSjoerg Note that you should **not** use an ``isa<>`` test followed by a ``cast<>``, 13906f32e7eSjoerg for that use the ``dyn_cast<>`` operator. 14006f32e7eSjoerg 14106f32e7eSjoerg``dyn_cast<>``: 14206f32e7eSjoerg The ``dyn_cast<>`` operator is a "checking cast" operation. It checks to see 14306f32e7eSjoerg if the operand is of the specified type, and if so, returns a pointer to it 14406f32e7eSjoerg (this operator does not work with references). If the operand is not of the 14506f32e7eSjoerg correct type, a null pointer is returned. Thus, this works very much like 14606f32e7eSjoerg the ``dynamic_cast<>`` operator in C++, and should be used in the same 14706f32e7eSjoerg circumstances. Typically, the ``dyn_cast<>`` operator is used in an ``if`` 14806f32e7eSjoerg statement or some other flow control statement like this: 14906f32e7eSjoerg 15006f32e7eSjoerg .. code-block:: c++ 15106f32e7eSjoerg 15206f32e7eSjoerg if (auto *AI = dyn_cast<AllocationInst>(Val)) { 15306f32e7eSjoerg // ... 15406f32e7eSjoerg } 15506f32e7eSjoerg 15606f32e7eSjoerg This form of the ``if`` statement effectively combines together a call to 15706f32e7eSjoerg ``isa<>`` and a call to ``cast<>`` into one statement, which is very 15806f32e7eSjoerg convenient. 15906f32e7eSjoerg 16006f32e7eSjoerg Note that the ``dyn_cast<>`` operator, like C++'s ``dynamic_cast<>`` or Java's 16106f32e7eSjoerg ``instanceof`` operator, can be abused. In particular, you should not use big 16206f32e7eSjoerg chained ``if/then/else`` blocks to check for lots of different variants of 16306f32e7eSjoerg classes. If you find yourself wanting to do this, it is much cleaner and more 16406f32e7eSjoerg efficient to use the ``InstVisitor`` class to dispatch over the instruction 16506f32e7eSjoerg type directly. 16606f32e7eSjoerg 16706f32e7eSjoerg``isa_and_nonnull<>``: 16806f32e7eSjoerg The ``isa_and_nonnull<>`` operator works just like the ``isa<>`` operator, 16906f32e7eSjoerg except that it allows for a null pointer as an argument (which it then 17006f32e7eSjoerg returns false). This can sometimes be useful, allowing you to combine several 17106f32e7eSjoerg null checks into one. 17206f32e7eSjoerg 17306f32e7eSjoerg``cast_or_null<>``: 17406f32e7eSjoerg The ``cast_or_null<>`` operator works just like the ``cast<>`` operator, 17506f32e7eSjoerg except that it allows for a null pointer as an argument (which it then 17606f32e7eSjoerg propagates). This can sometimes be useful, allowing you to combine several 17706f32e7eSjoerg null checks into one. 17806f32e7eSjoerg 17906f32e7eSjoerg``dyn_cast_or_null<>``: 18006f32e7eSjoerg The ``dyn_cast_or_null<>`` operator works just like the ``dyn_cast<>`` 18106f32e7eSjoerg operator, except that it allows for a null pointer as an argument (which it 18206f32e7eSjoerg then propagates). This can sometimes be useful, allowing you to combine 18306f32e7eSjoerg several null checks into one. 18406f32e7eSjoerg 18506f32e7eSjoergThese five templates can be used with any classes, whether they have a v-table 18606f32e7eSjoergor not. If you want to add support for these templates, see the document 18706f32e7eSjoerg:doc:`How to set up LLVM-style RTTI for your class hierarchy 18806f32e7eSjoerg<HowToSetUpLLVMStyleRTTI>` 18906f32e7eSjoerg 19006f32e7eSjoerg.. _string_apis: 19106f32e7eSjoerg 19206f32e7eSjoergPassing strings (the ``StringRef`` and ``Twine`` classes) 19306f32e7eSjoerg--------------------------------------------------------- 19406f32e7eSjoerg 19506f32e7eSjoergAlthough LLVM generally does not do much string manipulation, we do have several 19606f32e7eSjoergimportant APIs which take strings. Two important examples are the Value class 19706f32e7eSjoerg-- which has names for instructions, functions, etc. -- and the ``StringMap`` 19806f32e7eSjoergclass which is used extensively in LLVM and Clang. 19906f32e7eSjoerg 20006f32e7eSjoergThese are generic classes, and they need to be able to accept strings which may 20106f32e7eSjoerghave embedded null characters. Therefore, they cannot simply take a ``const 20206f32e7eSjoergchar *``, and taking a ``const std::string&`` requires clients to perform a heap 20306f32e7eSjoergallocation which is usually unnecessary. Instead, many LLVM APIs use a 20406f32e7eSjoerg``StringRef`` or a ``const Twine&`` for passing strings efficiently. 20506f32e7eSjoerg 20606f32e7eSjoerg.. _StringRef: 20706f32e7eSjoerg 20806f32e7eSjoergThe ``StringRef`` class 20906f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 21006f32e7eSjoerg 21106f32e7eSjoergThe ``StringRef`` data type represents a reference to a constant string (a 21206f32e7eSjoergcharacter array and a length) and supports the common operations available on 21306f32e7eSjoerg``std::string``, but does not require heap allocation. 21406f32e7eSjoerg 21506f32e7eSjoergIt can be implicitly constructed using a C style null-terminated string, an 21606f32e7eSjoerg``std::string``, or explicitly with a character pointer and length. For 21706f32e7eSjoergexample, the ``StringRef`` find function is declared as: 21806f32e7eSjoerg 21906f32e7eSjoerg.. code-block:: c++ 22006f32e7eSjoerg 22106f32e7eSjoerg iterator find(StringRef Key); 22206f32e7eSjoerg 22306f32e7eSjoergand clients can call it using any one of: 22406f32e7eSjoerg 22506f32e7eSjoerg.. code-block:: c++ 22606f32e7eSjoerg 22706f32e7eSjoerg Map.find("foo"); // Lookup "foo" 22806f32e7eSjoerg Map.find(std::string("bar")); // Lookup "bar" 22906f32e7eSjoerg Map.find(StringRef("\0baz", 4)); // Lookup "\0baz" 23006f32e7eSjoerg 23106f32e7eSjoergSimilarly, APIs which need to return a string may return a ``StringRef`` 23206f32e7eSjoerginstance, which can be used directly or converted to an ``std::string`` using 23306f32e7eSjoergthe ``str`` member function. See ``llvm/ADT/StringRef.h`` (`doxygen 234*da58b97aSjoerg<https://llvm.org/doxygen/StringRef_8h_source.html>`__) for more 23506f32e7eSjoerginformation. 23606f32e7eSjoerg 23706f32e7eSjoergYou should rarely use the ``StringRef`` class directly, because it contains 23806f32e7eSjoergpointers to external memory it is not generally safe to store an instance of the 23906f32e7eSjoergclass (unless you know that the external storage will not be freed). 24006f32e7eSjoerg``StringRef`` is small and pervasive enough in LLVM that it should always be 24106f32e7eSjoergpassed by value. 24206f32e7eSjoerg 24306f32e7eSjoergThe ``Twine`` class 24406f32e7eSjoerg^^^^^^^^^^^^^^^^^^^ 24506f32e7eSjoerg 246*da58b97aSjoergThe ``Twine`` (`doxygen <https://llvm.org/doxygen/classllvm_1_1Twine.html>`__) 24706f32e7eSjoergclass is an efficient way for APIs to accept concatenated strings. For example, 24806f32e7eSjoerga common LLVM paradigm is to name one instruction based on the name of another 24906f32e7eSjoerginstruction with a suffix, for example: 25006f32e7eSjoerg 25106f32e7eSjoerg.. code-block:: c++ 25206f32e7eSjoerg 25306f32e7eSjoerg New = CmpInst::Create(..., SO->getName() + ".cmp"); 25406f32e7eSjoerg 25506f32e7eSjoergThe ``Twine`` class is effectively a lightweight `rope 25606f32e7eSjoerg<http://en.wikipedia.org/wiki/Rope_(computer_science)>`_ which points to 25706f32e7eSjoergtemporary (stack allocated) objects. Twines can be implicitly constructed as 25806f32e7eSjoergthe result of the plus operator applied to strings (i.e., a C strings, an 25906f32e7eSjoerg``std::string``, or a ``StringRef``). The twine delays the actual concatenation 26006f32e7eSjoergof strings until it is actually required, at which point it can be efficiently 26106f32e7eSjoergrendered directly into a character array. This avoids unnecessary heap 26206f32e7eSjoergallocation involved in constructing the temporary results of string 26306f32e7eSjoergconcatenation. See ``llvm/ADT/Twine.h`` (`doxygen 264*da58b97aSjoerg<https://llvm.org/doxygen/Twine_8h_source.html>`__) and :ref:`here <dss_twine>` 26506f32e7eSjoergfor more information. 26606f32e7eSjoerg 26706f32e7eSjoergAs with a ``StringRef``, ``Twine`` objects point to external memory and should 26806f32e7eSjoergalmost never be stored or mentioned directly. They are intended solely for use 26906f32e7eSjoergwhen defining a function which should be able to efficiently accept concatenated 27006f32e7eSjoergstrings. 27106f32e7eSjoerg 27206f32e7eSjoerg.. _formatting_strings: 27306f32e7eSjoerg 27406f32e7eSjoergFormatting strings (the ``formatv`` function) 27506f32e7eSjoerg--------------------------------------------- 27606f32e7eSjoergWhile LLVM doesn't necessarily do a lot of string manipulation and parsing, it 27706f32e7eSjoergdoes do a lot of string formatting. From diagnostic messages, to llvm tool 27806f32e7eSjoergoutputs such as ``llvm-readobj`` to printing verbose disassembly listings and 27906f32e7eSjoergLLDB runtime logging, the need for string formatting is pervasive. 28006f32e7eSjoerg 28106f32e7eSjoergThe ``formatv`` is similar in spirit to ``printf``, but uses a different syntax 28206f32e7eSjoergwhich borrows heavily from Python and C#. Unlike ``printf`` it deduces the type 28306f32e7eSjoergto be formatted at compile time, so it does not need a format specifier such as 28406f32e7eSjoerg``%d``. This reduces the mental overhead of trying to construct portable format 28506f32e7eSjoergstrings, especially for platform-specific types like ``size_t`` or pointer types. 28606f32e7eSjoergUnlike both ``printf`` and Python, it additionally fails to compile if LLVM does 28706f32e7eSjoergnot know how to format the type. These two properties ensure that the function 28806f32e7eSjoergis both safer and simpler to use than traditional formatting methods such as 28906f32e7eSjoergthe ``printf`` family of functions. 29006f32e7eSjoerg 29106f32e7eSjoergSimple formatting 29206f32e7eSjoerg^^^^^^^^^^^^^^^^^ 29306f32e7eSjoerg 29406f32e7eSjoergA call to ``formatv`` involves a single **format string** consisting of 0 or more 29506f32e7eSjoerg**replacement sequences**, followed by a variable length list of **replacement values**. 29606f32e7eSjoergA replacement sequence is a string of the form ``{N[[,align]:style]}``. 29706f32e7eSjoerg 29806f32e7eSjoerg``N`` refers to the 0-based index of the argument from the list of replacement 29906f32e7eSjoergvalues. Note that this means it is possible to reference the same parameter 30006f32e7eSjoergmultiple times, possibly with different style and/or alignment options, in any order. 30106f32e7eSjoerg 30206f32e7eSjoerg``align`` is an optional string specifying the width of the field to format 30306f32e7eSjoergthe value into, and the alignment of the value within the field. It is specified as 30406f32e7eSjoergan optional **alignment style** followed by a positive integral **field width**. The 30506f32e7eSjoergalignment style can be one of the characters ``-`` (left align), ``=`` (center align), 30606f32e7eSjoergor ``+`` (right align). The default is right aligned. 30706f32e7eSjoerg 30806f32e7eSjoerg``style`` is an optional string consisting of a type specific that controls the 30906f32e7eSjoergformatting of the value. For example, to format a floating point value as a percentage, 31006f32e7eSjoergyou can use the style option ``P``. 31106f32e7eSjoerg 31206f32e7eSjoergCustom formatting 31306f32e7eSjoerg^^^^^^^^^^^^^^^^^ 31406f32e7eSjoerg 31506f32e7eSjoergThere are two ways to customize the formatting behavior for a type. 31606f32e7eSjoerg 31706f32e7eSjoerg1. Provide a template specialization of ``llvm::format_provider<T>`` for your 31806f32e7eSjoerg type ``T`` with the appropriate static format method. 31906f32e7eSjoerg 32006f32e7eSjoerg .. code-block:: c++ 32106f32e7eSjoerg 32206f32e7eSjoerg namespace llvm { 32306f32e7eSjoerg template<> 32406f32e7eSjoerg struct format_provider<MyFooBar> { 32506f32e7eSjoerg static void format(const MyFooBar &V, raw_ostream &Stream, StringRef Style) { 32606f32e7eSjoerg // Do whatever is necessary to format `V` into `Stream` 32706f32e7eSjoerg } 32806f32e7eSjoerg }; 32906f32e7eSjoerg void foo() { 33006f32e7eSjoerg MyFooBar X; 33106f32e7eSjoerg std::string S = formatv("{0}", X); 33206f32e7eSjoerg } 33306f32e7eSjoerg } 33406f32e7eSjoerg 33506f32e7eSjoerg This is a useful extensibility mechanism for adding support for formatting your own 33606f32e7eSjoerg custom types with your own custom Style options. But it does not help when you want 33706f32e7eSjoerg to extend the mechanism for formatting a type that the library already knows how to 33806f32e7eSjoerg format. For that, we need something else. 33906f32e7eSjoerg 34006f32e7eSjoerg2. Provide a **format adapter** inheriting from ``llvm::FormatAdapter<T>``. 34106f32e7eSjoerg 34206f32e7eSjoerg .. code-block:: c++ 34306f32e7eSjoerg 34406f32e7eSjoerg namespace anything { 34506f32e7eSjoerg struct format_int_custom : public llvm::FormatAdapter<int> { 34606f32e7eSjoerg explicit format_int_custom(int N) : llvm::FormatAdapter<int>(N) {} 34706f32e7eSjoerg void format(llvm::raw_ostream &Stream, StringRef Style) override { 34806f32e7eSjoerg // Do whatever is necessary to format ``this->Item`` into ``Stream`` 34906f32e7eSjoerg } 35006f32e7eSjoerg }; 35106f32e7eSjoerg } 35206f32e7eSjoerg namespace llvm { 35306f32e7eSjoerg void foo() { 35406f32e7eSjoerg std::string S = formatv("{0}", anything::format_int_custom(42)); 35506f32e7eSjoerg } 35606f32e7eSjoerg } 35706f32e7eSjoerg 35806f32e7eSjoerg If the type is detected to be derived from ``FormatAdapter<T>``, ``formatv`` 35906f32e7eSjoerg will call the 36006f32e7eSjoerg ``format`` method on the argument passing in the specified style. This allows 36106f32e7eSjoerg one to provide custom formatting of any type, including one which already has 36206f32e7eSjoerg a builtin format provider. 36306f32e7eSjoerg 36406f32e7eSjoerg``formatv`` Examples 36506f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^ 36606f32e7eSjoergBelow is intended to provide an incomplete set of examples demonstrating 36706f32e7eSjoergthe usage of ``formatv``. More information can be found by reading the 36806f32e7eSjoergdoxygen documentation or by looking at the unit test suite. 36906f32e7eSjoerg 37006f32e7eSjoerg 37106f32e7eSjoerg.. code-block:: c++ 37206f32e7eSjoerg 37306f32e7eSjoerg std::string S; 37406f32e7eSjoerg // Simple formatting of basic types and implicit string conversion. 37506f32e7eSjoerg S = formatv("{0} ({1:P})", 7, 0.35); // S == "7 (35.00%)" 37606f32e7eSjoerg 37706f32e7eSjoerg // Out-of-order referencing and multi-referencing 37806f32e7eSjoerg outs() << formatv("{0} {2} {1} {0}", 1, "test", 3); // prints "1 3 test 1" 37906f32e7eSjoerg 38006f32e7eSjoerg // Left, right, and center alignment 38106f32e7eSjoerg S = formatv("{0,7}", 'a'); // S == " a"; 38206f32e7eSjoerg S = formatv("{0,-7}", 'a'); // S == "a "; 38306f32e7eSjoerg S = formatv("{0,=7}", 'a'); // S == " a "; 38406f32e7eSjoerg S = formatv("{0,+7}", 'a'); // S == " a"; 38506f32e7eSjoerg 38606f32e7eSjoerg // Custom styles 38706f32e7eSjoerg S = formatv("{0:N} - {0:x} - {1:E}", 12345, 123908342); // S == "12,345 - 0x3039 - 1.24E8" 38806f32e7eSjoerg 38906f32e7eSjoerg // Adapters 39006f32e7eSjoerg S = formatv("{0}", fmt_align(42, AlignStyle::Center, 7)); // S == " 42 " 39106f32e7eSjoerg S = formatv("{0}", fmt_repeat("hi", 3)); // S == "hihihi" 39206f32e7eSjoerg S = formatv("{0}", fmt_pad("hi", 2, 6)); // S == " hi " 39306f32e7eSjoerg 39406f32e7eSjoerg // Ranges 39506f32e7eSjoerg std::vector<int> V = {8, 9, 10}; 39606f32e7eSjoerg S = formatv("{0}", make_range(V.begin(), V.end())); // S == "8, 9, 10" 39706f32e7eSjoerg S = formatv("{0:$[+]}", make_range(V.begin(), V.end())); // S == "8+9+10" 39806f32e7eSjoerg S = formatv("{0:$[ + ]@[x]}", make_range(V.begin(), V.end())); // S == "0x8 + 0x9 + 0xA" 39906f32e7eSjoerg 40006f32e7eSjoerg.. _error_apis: 40106f32e7eSjoerg 40206f32e7eSjoergError handling 40306f32e7eSjoerg-------------- 40406f32e7eSjoerg 40506f32e7eSjoergProper error handling helps us identify bugs in our code, and helps end-users 40606f32e7eSjoergunderstand errors in their tool usage. Errors fall into two broad categories: 40706f32e7eSjoerg*programmatic* and *recoverable*, with different strategies for handling and 40806f32e7eSjoergreporting. 40906f32e7eSjoerg 41006f32e7eSjoergProgrammatic Errors 41106f32e7eSjoerg^^^^^^^^^^^^^^^^^^^ 41206f32e7eSjoerg 41306f32e7eSjoergProgrammatic errors are violations of program invariants or API contracts, and 41406f32e7eSjoergrepresent bugs within the program itself. Our aim is to document invariants, and 41506f32e7eSjoergto abort quickly at the point of failure (providing some basic diagnostic) when 41606f32e7eSjoerginvariants are broken at runtime. 41706f32e7eSjoerg 41806f32e7eSjoergThe fundamental tools for handling programmatic errors are assertions and the 41906f32e7eSjoergllvm_unreachable function. Assertions are used to express invariant conditions, 42006f32e7eSjoergand should include a message describing the invariant: 42106f32e7eSjoerg 42206f32e7eSjoerg.. code-block:: c++ 42306f32e7eSjoerg 42406f32e7eSjoerg assert(isPhysReg(R) && "All virt regs should have been allocated already."); 42506f32e7eSjoerg 42606f32e7eSjoergThe llvm_unreachable function can be used to document areas of control flow 42706f32e7eSjoergthat should never be entered if the program invariants hold: 42806f32e7eSjoerg 42906f32e7eSjoerg.. code-block:: c++ 43006f32e7eSjoerg 43106f32e7eSjoerg enum { Foo, Bar, Baz } X = foo(); 43206f32e7eSjoerg 43306f32e7eSjoerg switch (X) { 43406f32e7eSjoerg case Foo: /* Handle Foo */; break; 43506f32e7eSjoerg case Bar: /* Handle Bar */; break; 43606f32e7eSjoerg default: 43706f32e7eSjoerg llvm_unreachable("X should be Foo or Bar here"); 43806f32e7eSjoerg } 43906f32e7eSjoerg 44006f32e7eSjoergRecoverable Errors 44106f32e7eSjoerg^^^^^^^^^^^^^^^^^^ 44206f32e7eSjoerg 44306f32e7eSjoergRecoverable errors represent an error in the program's environment, for example 44406f32e7eSjoerga resource failure (a missing file, a dropped network connection, etc.), or 44506f32e7eSjoergmalformed input. These errors should be detected and communicated to a level of 44606f32e7eSjoergthe program where they can be handled appropriately. Handling the error may be 44706f32e7eSjoergas simple as reporting the issue to the user, or it may involve attempts at 44806f32e7eSjoergrecovery. 44906f32e7eSjoerg 45006f32e7eSjoerg.. note:: 45106f32e7eSjoerg 45206f32e7eSjoerg While it would be ideal to use this error handling scheme throughout 45306f32e7eSjoerg LLVM, there are places where this hasn't been practical to apply. In 45406f32e7eSjoerg situations where you absolutely must emit a non-programmatic error and 45506f32e7eSjoerg the ``Error`` model isn't workable you can call ``report_fatal_error``, 456*da58b97aSjoerg which will call installed error handlers, print a message, and abort the 457*da58b97aSjoerg program. The use of `report_fatal_error` in this case is discouraged. 45806f32e7eSjoerg 45906f32e7eSjoergRecoverable errors are modeled using LLVM's ``Error`` scheme. This scheme 46006f32e7eSjoergrepresents errors using function return values, similar to classic C integer 46106f32e7eSjoergerror codes, or C++'s ``std::error_code``. However, the ``Error`` class is 46206f32e7eSjoergactually a lightweight wrapper for user-defined error types, allowing arbitrary 46306f32e7eSjoerginformation to be attached to describe the error. This is similar to the way C++ 46406f32e7eSjoergexceptions allow throwing of user-defined types. 46506f32e7eSjoerg 46606f32e7eSjoergSuccess values are created by calling ``Error::success()``, E.g.: 46706f32e7eSjoerg 46806f32e7eSjoerg.. code-block:: c++ 46906f32e7eSjoerg 47006f32e7eSjoerg Error foo() { 47106f32e7eSjoerg // Do something. 47206f32e7eSjoerg // Return success. 47306f32e7eSjoerg return Error::success(); 47406f32e7eSjoerg } 47506f32e7eSjoerg 47606f32e7eSjoergSuccess values are very cheap to construct and return - they have minimal 47706f32e7eSjoergimpact on program performance. 47806f32e7eSjoerg 47906f32e7eSjoergFailure values are constructed using ``make_error<T>``, where ``T`` is any class 48006f32e7eSjoergthat inherits from the ErrorInfo utility, E.g.: 48106f32e7eSjoerg 48206f32e7eSjoerg.. code-block:: c++ 48306f32e7eSjoerg 48406f32e7eSjoerg class BadFileFormat : public ErrorInfo<BadFileFormat> { 48506f32e7eSjoerg public: 48606f32e7eSjoerg static char ID; 48706f32e7eSjoerg std::string Path; 48806f32e7eSjoerg 48906f32e7eSjoerg BadFileFormat(StringRef Path) : Path(Path.str()) {} 49006f32e7eSjoerg 49106f32e7eSjoerg void log(raw_ostream &OS) const override { 49206f32e7eSjoerg OS << Path << " is malformed"; 49306f32e7eSjoerg } 49406f32e7eSjoerg 49506f32e7eSjoerg std::error_code convertToErrorCode() const override { 49606f32e7eSjoerg return make_error_code(object_error::parse_failed); 49706f32e7eSjoerg } 49806f32e7eSjoerg }; 49906f32e7eSjoerg 50006f32e7eSjoerg char BadFileFormat::ID; // This should be declared in the C++ file. 50106f32e7eSjoerg 50206f32e7eSjoerg Error printFormattedFile(StringRef Path) { 50306f32e7eSjoerg if (<check for valid format>) 50406f32e7eSjoerg return make_error<BadFileFormat>(Path); 50506f32e7eSjoerg // print file contents. 50606f32e7eSjoerg return Error::success(); 50706f32e7eSjoerg } 50806f32e7eSjoerg 50906f32e7eSjoergError values can be implicitly converted to bool: true for error, false for 51006f32e7eSjoergsuccess, enabling the following idiom: 51106f32e7eSjoerg 51206f32e7eSjoerg.. code-block:: c++ 51306f32e7eSjoerg 51406f32e7eSjoerg Error mayFail(); 51506f32e7eSjoerg 51606f32e7eSjoerg Error foo() { 51706f32e7eSjoerg if (auto Err = mayFail()) 51806f32e7eSjoerg return Err; 51906f32e7eSjoerg // Success! We can proceed. 52006f32e7eSjoerg ... 52106f32e7eSjoerg 52206f32e7eSjoergFor functions that can fail but need to return a value the ``Expected<T>`` 52306f32e7eSjoergutility can be used. Values of this type can be constructed with either a 52406f32e7eSjoerg``T``, or an ``Error``. Expected<T> values are also implicitly convertible to 52506f32e7eSjoergboolean, but with the opposite convention to ``Error``: true for success, false 52606f32e7eSjoergfor error. If success, the ``T`` value can be accessed via the dereference 52706f32e7eSjoergoperator. If failure, the ``Error`` value can be extracted using the 52806f32e7eSjoerg``takeError()`` method. Idiomatic usage looks like: 52906f32e7eSjoerg 53006f32e7eSjoerg.. code-block:: c++ 53106f32e7eSjoerg 53206f32e7eSjoerg Expected<FormattedFile> openFormattedFile(StringRef Path) { 53306f32e7eSjoerg // If badly formatted, return an error. 53406f32e7eSjoerg if (auto Err = checkFormat(Path)) 53506f32e7eSjoerg return std::move(Err); 53606f32e7eSjoerg // Otherwise return a FormattedFile instance. 53706f32e7eSjoerg return FormattedFile(Path); 53806f32e7eSjoerg } 53906f32e7eSjoerg 54006f32e7eSjoerg Error processFormattedFile(StringRef Path) { 54106f32e7eSjoerg // Try to open a formatted file 54206f32e7eSjoerg if (auto FileOrErr = openFormattedFile(Path)) { 54306f32e7eSjoerg // On success, grab a reference to the file and continue. 54406f32e7eSjoerg auto &File = *FileOrErr; 54506f32e7eSjoerg ... 54606f32e7eSjoerg } else 54706f32e7eSjoerg // On error, extract the Error value and return it. 54806f32e7eSjoerg return FileOrErr.takeError(); 54906f32e7eSjoerg } 55006f32e7eSjoerg 55106f32e7eSjoergIf an ``Expected<T>`` value is in success mode then the ``takeError()`` method 55206f32e7eSjoergwill return a success value. Using this fact, the above function can be 55306f32e7eSjoergrewritten as: 55406f32e7eSjoerg 55506f32e7eSjoerg.. code-block:: c++ 55606f32e7eSjoerg 55706f32e7eSjoerg Error processFormattedFile(StringRef Path) { 55806f32e7eSjoerg // Try to open a formatted file 55906f32e7eSjoerg auto FileOrErr = openFormattedFile(Path); 56006f32e7eSjoerg if (auto Err = FileOrErr.takeError()) 56106f32e7eSjoerg // On error, extract the Error value and return it. 56206f32e7eSjoerg return Err; 56306f32e7eSjoerg // On success, grab a reference to the file and continue. 56406f32e7eSjoerg auto &File = *FileOrErr; 56506f32e7eSjoerg ... 56606f32e7eSjoerg } 56706f32e7eSjoerg 56806f32e7eSjoergThis second form is often more readable for functions that involve multiple 56906f32e7eSjoerg``Expected<T>`` values as it limits the indentation required. 57006f32e7eSjoerg 57106f32e7eSjoergAll ``Error`` instances, whether success or failure, must be either checked or 57206f32e7eSjoergmoved from (via ``std::move`` or a return) before they are destructed. 57306f32e7eSjoergAccidentally discarding an unchecked error will cause a program abort at the 57406f32e7eSjoergpoint where the unchecked value's destructor is run, making it easy to identify 57506f32e7eSjoergand fix violations of this rule. 57606f32e7eSjoerg 57706f32e7eSjoergSuccess values are considered checked once they have been tested (by invoking 57806f32e7eSjoergthe boolean conversion operator): 57906f32e7eSjoerg 58006f32e7eSjoerg.. code-block:: c++ 58106f32e7eSjoerg 58206f32e7eSjoerg if (auto Err = mayFail(...)) 58306f32e7eSjoerg return Err; // Failure value - move error to caller. 58406f32e7eSjoerg 58506f32e7eSjoerg // Safe to continue: Err was checked. 58606f32e7eSjoerg 58706f32e7eSjoergIn contrast, the following code will always cause an abort, even if ``mayFail`` 58806f32e7eSjoergreturns a success value: 58906f32e7eSjoerg 59006f32e7eSjoerg.. code-block:: c++ 59106f32e7eSjoerg 59206f32e7eSjoerg mayFail(); 59306f32e7eSjoerg // Program will always abort here, even if mayFail() returns Success, since 59406f32e7eSjoerg // the value is not checked. 59506f32e7eSjoerg 59606f32e7eSjoergFailure values are considered checked once a handler for the error type has 59706f32e7eSjoergbeen activated: 59806f32e7eSjoerg 59906f32e7eSjoerg.. code-block:: c++ 60006f32e7eSjoerg 60106f32e7eSjoerg handleErrors( 60206f32e7eSjoerg processFormattedFile(...), 60306f32e7eSjoerg [](const BadFileFormat &BFF) { 60406f32e7eSjoerg report("Unable to process " + BFF.Path + ": bad format"); 60506f32e7eSjoerg }, 60606f32e7eSjoerg [](const FileNotFound &FNF) { 60706f32e7eSjoerg report("File not found " + FNF.Path); 60806f32e7eSjoerg }); 60906f32e7eSjoerg 61006f32e7eSjoergThe ``handleErrors`` function takes an error as its first argument, followed by 61106f32e7eSjoerga variadic list of "handlers", each of which must be a callable type (a 61206f32e7eSjoergfunction, lambda, or class with a call operator) with one argument. The 61306f32e7eSjoerg``handleErrors`` function will visit each handler in the sequence and check its 61406f32e7eSjoergargument type against the dynamic type of the error, running the first handler 61506f32e7eSjoergthat matches. This is the same decision process that is used decide which catch 61606f32e7eSjoergclause to run for a C++ exception. 61706f32e7eSjoerg 61806f32e7eSjoergSince the list of handlers passed to ``handleErrors`` may not cover every error 61906f32e7eSjoergtype that can occur, the ``handleErrors`` function also returns an Error value 62006f32e7eSjoergthat must be checked or propagated. If the error value that is passed to 62106f32e7eSjoerg``handleErrors`` does not match any of the handlers it will be returned from 62206f32e7eSjoerghandleErrors. Idiomatic use of ``handleErrors`` thus looks like: 62306f32e7eSjoerg 62406f32e7eSjoerg.. code-block:: c++ 62506f32e7eSjoerg 62606f32e7eSjoerg if (auto Err = 62706f32e7eSjoerg handleErrors( 62806f32e7eSjoerg processFormattedFile(...), 62906f32e7eSjoerg [](const BadFileFormat &BFF) { 63006f32e7eSjoerg report("Unable to process " + BFF.Path + ": bad format"); 63106f32e7eSjoerg }, 63206f32e7eSjoerg [](const FileNotFound &FNF) { 63306f32e7eSjoerg report("File not found " + FNF.Path); 63406f32e7eSjoerg })) 63506f32e7eSjoerg return Err; 63606f32e7eSjoerg 63706f32e7eSjoergIn cases where you truly know that the handler list is exhaustive the 63806f32e7eSjoerg``handleAllErrors`` function can be used instead. This is identical to 63906f32e7eSjoerg``handleErrors`` except that it will terminate the program if an unhandled 64006f32e7eSjoergerror is passed in, and can therefore return void. The ``handleAllErrors`` 64106f32e7eSjoergfunction should generally be avoided: the introduction of a new error type 64206f32e7eSjoergelsewhere in the program can easily turn a formerly exhaustive list of errors 64306f32e7eSjoerginto a non-exhaustive list, risking unexpected program termination. Where 64406f32e7eSjoergpossible, use handleErrors and propagate unknown errors up the stack instead. 64506f32e7eSjoerg 64606f32e7eSjoergFor tool code, where errors can be handled by printing an error message then 64706f32e7eSjoergexiting with an error code, the :ref:`ExitOnError <err_exitonerr>` utility 64806f32e7eSjoergmay be a better choice than handleErrors, as it simplifies control flow when 64906f32e7eSjoergcalling fallible functions. 65006f32e7eSjoerg 65106f32e7eSjoergIn situations where it is known that a particular call to a fallible function 65206f32e7eSjoergwill always succeed (for example, a call to a function that can only fail on a 65306f32e7eSjoergsubset of inputs with an input that is known to be safe) the 65406f32e7eSjoerg:ref:`cantFail <err_cantfail>` functions can be used to remove the error type, 65506f32e7eSjoergsimplifying control flow. 65606f32e7eSjoerg 65706f32e7eSjoergStringError 65806f32e7eSjoerg""""""""""" 65906f32e7eSjoerg 66006f32e7eSjoergMany kinds of errors have no recovery strategy, the only action that can be 66106f32e7eSjoergtaken is to report them to the user so that the user can attempt to fix the 66206f32e7eSjoergenvironment. In this case representing the error as a string makes perfect 66306f32e7eSjoergsense. LLVM provides the ``StringError`` class for this purpose. It takes two 66406f32e7eSjoergarguments: A string error message, and an equivalent ``std::error_code`` for 66506f32e7eSjoerginteroperability. It also provides a ``createStringError`` function to simplify 66606f32e7eSjoergcommon usage of this class: 66706f32e7eSjoerg 66806f32e7eSjoerg.. code-block:: c++ 66906f32e7eSjoerg 67006f32e7eSjoerg // These two lines of code are equivalent: 67106f32e7eSjoerg make_error<StringError>("Bad executable", errc::executable_format_error); 67206f32e7eSjoerg createStringError(errc::executable_format_error, "Bad executable"); 67306f32e7eSjoerg 67406f32e7eSjoergIf you're certain that the error you're building will never need to be converted 67506f32e7eSjoergto a ``std::error_code`` you can use the ``inconvertibleErrorCode()`` function: 67606f32e7eSjoerg 67706f32e7eSjoerg.. code-block:: c++ 67806f32e7eSjoerg 67906f32e7eSjoerg createStringError(inconvertibleErrorCode(), "Bad executable"); 68006f32e7eSjoerg 68106f32e7eSjoergThis should be done only after careful consideration. If any attempt is made to 68206f32e7eSjoergconvert this error to a ``std::error_code`` it will trigger immediate program 68306f32e7eSjoergtermination. Unless you are certain that your errors will not need 68406f32e7eSjoerginteroperability you should look for an existing ``std::error_code`` that you 68506f32e7eSjoergcan convert to, and even (as painful as it is) consider introducing a new one as 68606f32e7eSjoerga stopgap measure. 68706f32e7eSjoerg 68806f32e7eSjoerg``createStringError`` can take ``printf`` style format specifiers to provide a 68906f32e7eSjoergformatted message: 69006f32e7eSjoerg 69106f32e7eSjoerg.. code-block:: c++ 69206f32e7eSjoerg 69306f32e7eSjoerg createStringError(errc::executable_format_error, 69406f32e7eSjoerg "Bad executable: %s", FileName); 69506f32e7eSjoerg 69606f32e7eSjoergInteroperability with std::error_code and ErrorOr 69706f32e7eSjoerg""""""""""""""""""""""""""""""""""""""""""""""""" 69806f32e7eSjoerg 69906f32e7eSjoergMany existing LLVM APIs use ``std::error_code`` and its partner ``ErrorOr<T>`` 70006f32e7eSjoerg(which plays the same role as ``Expected<T>``, but wraps a ``std::error_code`` 70106f32e7eSjoergrather than an ``Error``). The infectious nature of error types means that an 70206f32e7eSjoergattempt to change one of these functions to return ``Error`` or ``Expected<T>`` 70306f32e7eSjoerginstead often results in an avalanche of changes to callers, callers of callers, 70406f32e7eSjoergand so on. (The first such attempt, returning an ``Error`` from 70506f32e7eSjoergMachOObjectFile's constructor, was abandoned after the diff reached 3000 lines, 70606f32e7eSjoergimpacted half a dozen libraries, and was still growing). 70706f32e7eSjoerg 70806f32e7eSjoergTo solve this problem, the ``Error``/``std::error_code`` interoperability requirement was 70906f32e7eSjoergintroduced. Two pairs of functions allow any ``Error`` value to be converted to a 71006f32e7eSjoerg``std::error_code``, any ``Expected<T>`` to be converted to an ``ErrorOr<T>``, and vice 71106f32e7eSjoergversa: 71206f32e7eSjoerg 71306f32e7eSjoerg.. code-block:: c++ 71406f32e7eSjoerg 71506f32e7eSjoerg std::error_code errorToErrorCode(Error Err); 71606f32e7eSjoerg Error errorCodeToError(std::error_code EC); 71706f32e7eSjoerg 71806f32e7eSjoerg template <typename T> ErrorOr<T> expectedToErrorOr(Expected<T> TOrErr); 71906f32e7eSjoerg template <typename T> Expected<T> errorOrToExpected(ErrorOr<T> TOrEC); 72006f32e7eSjoerg 72106f32e7eSjoerg 72206f32e7eSjoergUsing these APIs it is easy to make surgical patches that update individual 72306f32e7eSjoergfunctions from ``std::error_code`` to ``Error``, and from ``ErrorOr<T>`` to 72406f32e7eSjoerg``Expected<T>``. 72506f32e7eSjoerg 72606f32e7eSjoergReturning Errors from error handlers 72706f32e7eSjoerg"""""""""""""""""""""""""""""""""""" 72806f32e7eSjoerg 72906f32e7eSjoergError recovery attempts may themselves fail. For that reason, ``handleErrors`` 73006f32e7eSjoergactually recognises three different forms of handler signature: 73106f32e7eSjoerg 73206f32e7eSjoerg.. code-block:: c++ 73306f32e7eSjoerg 73406f32e7eSjoerg // Error must be handled, no new errors produced: 73506f32e7eSjoerg void(UserDefinedError &E); 73606f32e7eSjoerg 73706f32e7eSjoerg // Error must be handled, new errors can be produced: 73806f32e7eSjoerg Error(UserDefinedError &E); 73906f32e7eSjoerg 74006f32e7eSjoerg // Original error can be inspected, then re-wrapped and returned (or a new 74106f32e7eSjoerg // error can be produced): 74206f32e7eSjoerg Error(std::unique_ptr<UserDefinedError> E); 74306f32e7eSjoerg 74406f32e7eSjoergAny error returned from a handler will be returned from the ``handleErrors`` 74506f32e7eSjoergfunction so that it can be handled itself, or propagated up the stack. 74606f32e7eSjoerg 74706f32e7eSjoerg.. _err_exitonerr: 74806f32e7eSjoerg 74906f32e7eSjoergUsing ExitOnError to simplify tool code 75006f32e7eSjoerg""""""""""""""""""""""""""""""""""""""" 75106f32e7eSjoerg 75206f32e7eSjoergLibrary code should never call ``exit`` for a recoverable error, however in tool 75306f32e7eSjoergcode (especially command line tools) this can be a reasonable approach. Calling 75406f32e7eSjoerg``exit`` upon encountering an error dramatically simplifies control flow as the 75506f32e7eSjoergerror no longer needs to be propagated up the stack. This allows code to be 75606f32e7eSjoergwritten in straight-line style, as long as each fallible call is wrapped in a 75706f32e7eSjoergcheck and call to exit. The ``ExitOnError`` class supports this pattern by 75806f32e7eSjoergproviding call operators that inspect ``Error`` values, stripping the error away 75906f32e7eSjoergin the success case and logging to ``stderr`` then exiting in the failure case. 76006f32e7eSjoerg 76106f32e7eSjoergTo use this class, declare a global ``ExitOnError`` variable in your program: 76206f32e7eSjoerg 76306f32e7eSjoerg.. code-block:: c++ 76406f32e7eSjoerg 76506f32e7eSjoerg ExitOnError ExitOnErr; 76606f32e7eSjoerg 76706f32e7eSjoergCalls to fallible functions can then be wrapped with a call to ``ExitOnErr``, 76806f32e7eSjoergturning them into non-failing calls: 76906f32e7eSjoerg 77006f32e7eSjoerg.. code-block:: c++ 77106f32e7eSjoerg 77206f32e7eSjoerg Error mayFail(); 77306f32e7eSjoerg Expected<int> mayFail2(); 77406f32e7eSjoerg 77506f32e7eSjoerg void foo() { 77606f32e7eSjoerg ExitOnErr(mayFail()); 77706f32e7eSjoerg int X = ExitOnErr(mayFail2()); 77806f32e7eSjoerg } 77906f32e7eSjoerg 78006f32e7eSjoergOn failure, the error's log message will be written to ``stderr``, optionally 78106f32e7eSjoergpreceded by a string "banner" that can be set by calling the setBanner method. A 78206f32e7eSjoergmapping can also be supplied from ``Error`` values to exit codes using the 78306f32e7eSjoerg``setExitCodeMapper`` method: 78406f32e7eSjoerg 78506f32e7eSjoerg.. code-block:: c++ 78606f32e7eSjoerg 78706f32e7eSjoerg int main(int argc, char *argv[]) { 78806f32e7eSjoerg ExitOnErr.setBanner(std::string(argv[0]) + " error:"); 78906f32e7eSjoerg ExitOnErr.setExitCodeMapper( 79006f32e7eSjoerg [](const Error &Err) { 79106f32e7eSjoerg if (Err.isA<BadFileFormat>()) 79206f32e7eSjoerg return 2; 79306f32e7eSjoerg return 1; 79406f32e7eSjoerg }); 79506f32e7eSjoerg 79606f32e7eSjoergUse ``ExitOnError`` in your tool code where possible as it can greatly improve 79706f32e7eSjoergreadability. 79806f32e7eSjoerg 79906f32e7eSjoerg.. _err_cantfail: 80006f32e7eSjoerg 80106f32e7eSjoergUsing cantFail to simplify safe callsites 80206f32e7eSjoerg""""""""""""""""""""""""""""""""""""""""" 80306f32e7eSjoerg 80406f32e7eSjoergSome functions may only fail for a subset of their inputs, so calls using known 80506f32e7eSjoergsafe inputs can be assumed to succeed. 80606f32e7eSjoerg 80706f32e7eSjoergThe cantFail functions encapsulate this by wrapping an assertion that their 80806f32e7eSjoergargument is a success value and, in the case of Expected<T>, unwrapping the 80906f32e7eSjoergT value: 81006f32e7eSjoerg 81106f32e7eSjoerg.. code-block:: c++ 81206f32e7eSjoerg 81306f32e7eSjoerg Error onlyFailsForSomeXValues(int X); 81406f32e7eSjoerg Expected<int> onlyFailsForSomeXValues2(int X); 81506f32e7eSjoerg 81606f32e7eSjoerg void foo() { 81706f32e7eSjoerg cantFail(onlyFailsForSomeXValues(KnownSafeValue)); 81806f32e7eSjoerg int Y = cantFail(onlyFailsForSomeXValues2(KnownSafeValue)); 81906f32e7eSjoerg ... 82006f32e7eSjoerg } 82106f32e7eSjoerg 82206f32e7eSjoergLike the ExitOnError utility, cantFail simplifies control flow. Their treatment 82306f32e7eSjoergof error cases is very different however: Where ExitOnError is guaranteed to 82406f32e7eSjoergterminate the program on an error input, cantFail simply asserts that the result 82506f32e7eSjoergis success. In debug builds this will result in an assertion failure if an error 82606f32e7eSjoergis encountered. In release builds the behavior of cantFail for failure values is 82706f32e7eSjoergundefined. As such, care must be taken in the use of cantFail: clients must be 82806f32e7eSjoergcertain that a cantFail wrapped call really can not fail with the given 82906f32e7eSjoergarguments. 83006f32e7eSjoerg 83106f32e7eSjoergUse of the cantFail functions should be rare in library code, but they are 83206f32e7eSjoerglikely to be of more use in tool and unit-test code where inputs and/or 83306f32e7eSjoergmocked-up classes or functions may be known to be safe. 83406f32e7eSjoerg 83506f32e7eSjoergFallible constructors 83606f32e7eSjoerg""""""""""""""""""""" 83706f32e7eSjoerg 83806f32e7eSjoergSome classes require resource acquisition or other complex initialization that 83906f32e7eSjoergcan fail during construction. Unfortunately constructors can't return errors, 84006f32e7eSjoergand having clients test objects after they're constructed to ensure that they're 84106f32e7eSjoergvalid is error prone as it's all too easy to forget the test. To work around 84206f32e7eSjoergthis, use the named constructor idiom and return an ``Expected<T>``: 84306f32e7eSjoerg 84406f32e7eSjoerg.. code-block:: c++ 84506f32e7eSjoerg 84606f32e7eSjoerg class Foo { 84706f32e7eSjoerg public: 84806f32e7eSjoerg 84906f32e7eSjoerg static Expected<Foo> Create(Resource R1, Resource R2) { 850*da58b97aSjoerg Error Err = Error::success(); 85106f32e7eSjoerg Foo F(R1, R2, Err); 85206f32e7eSjoerg if (Err) 85306f32e7eSjoerg return std::move(Err); 85406f32e7eSjoerg return std::move(F); 85506f32e7eSjoerg } 85606f32e7eSjoerg 85706f32e7eSjoerg private: 85806f32e7eSjoerg 85906f32e7eSjoerg Foo(Resource R1, Resource R2, Error &Err) { 86006f32e7eSjoerg ErrorAsOutParameter EAO(&Err); 86106f32e7eSjoerg if (auto Err2 = R1.acquire()) { 86206f32e7eSjoerg Err = std::move(Err2); 86306f32e7eSjoerg return; 86406f32e7eSjoerg } 86506f32e7eSjoerg Err = R2.acquire(); 86606f32e7eSjoerg } 86706f32e7eSjoerg }; 86806f32e7eSjoerg 86906f32e7eSjoerg 87006f32e7eSjoergHere, the named constructor passes an ``Error`` by reference into the actual 87106f32e7eSjoergconstructor, which the constructor can then use to return errors. The 87206f32e7eSjoerg``ErrorAsOutParameter`` utility sets the ``Error`` value's checked flag on entry 87306f32e7eSjoergto the constructor so that the error can be assigned to, then resets it on exit 87406f32e7eSjoergto force the client (the named constructor) to check the error. 87506f32e7eSjoerg 87606f32e7eSjoergBy using this idiom, clients attempting to construct a Foo receive either a 87706f32e7eSjoergwell-formed Foo or an Error, never an object in an invalid state. 87806f32e7eSjoerg 87906f32e7eSjoergPropagating and consuming errors based on types 88006f32e7eSjoerg""""""""""""""""""""""""""""""""""""""""""""""" 88106f32e7eSjoerg 88206f32e7eSjoergIn some contexts, certain types of error are known to be benign. For example, 88306f32e7eSjoergwhen walking an archive, some clients may be happy to skip over badly formatted 88406f32e7eSjoergobject files rather than terminating the walk immediately. Skipping badly 88506f32e7eSjoergformatted objects could be achieved using an elaborate handler method, but the 88606f32e7eSjoergError.h header provides two utilities that make this idiom much cleaner: the 88706f32e7eSjoergtype inspection method, ``isA``, and the ``consumeError`` function: 88806f32e7eSjoerg 88906f32e7eSjoerg.. code-block:: c++ 89006f32e7eSjoerg 89106f32e7eSjoerg Error walkArchive(Archive A) { 89206f32e7eSjoerg for (unsigned I = 0; I != A.numMembers(); ++I) { 89306f32e7eSjoerg auto ChildOrErr = A.getMember(I); 89406f32e7eSjoerg if (auto Err = ChildOrErr.takeError()) { 89506f32e7eSjoerg if (Err.isA<BadFileFormat>()) 89606f32e7eSjoerg consumeError(std::move(Err)) 89706f32e7eSjoerg else 89806f32e7eSjoerg return Err; 89906f32e7eSjoerg } 90006f32e7eSjoerg auto &Child = *ChildOrErr; 90106f32e7eSjoerg // Use Child 90206f32e7eSjoerg ... 90306f32e7eSjoerg } 90406f32e7eSjoerg return Error::success(); 90506f32e7eSjoerg } 90606f32e7eSjoerg 90706f32e7eSjoergConcatenating Errors with joinErrors 90806f32e7eSjoerg"""""""""""""""""""""""""""""""""""" 90906f32e7eSjoerg 91006f32e7eSjoergIn the archive walking example above ``BadFileFormat`` errors are simply 91106f32e7eSjoergconsumed and ignored. If the client had wanted report these errors after 91206f32e7eSjoergcompleting the walk over the archive they could use the ``joinErrors`` utility: 91306f32e7eSjoerg 91406f32e7eSjoerg.. code-block:: c++ 91506f32e7eSjoerg 91606f32e7eSjoerg Error walkArchive(Archive A) { 91706f32e7eSjoerg Error DeferredErrs = Error::success(); 91806f32e7eSjoerg for (unsigned I = 0; I != A.numMembers(); ++I) { 91906f32e7eSjoerg auto ChildOrErr = A.getMember(I); 92006f32e7eSjoerg if (auto Err = ChildOrErr.takeError()) 92106f32e7eSjoerg if (Err.isA<BadFileFormat>()) 92206f32e7eSjoerg DeferredErrs = joinErrors(std::move(DeferredErrs), std::move(Err)); 92306f32e7eSjoerg else 92406f32e7eSjoerg return Err; 92506f32e7eSjoerg auto &Child = *ChildOrErr; 92606f32e7eSjoerg // Use Child 92706f32e7eSjoerg ... 92806f32e7eSjoerg } 92906f32e7eSjoerg return DeferredErrs; 93006f32e7eSjoerg } 93106f32e7eSjoerg 93206f32e7eSjoergThe ``joinErrors`` routine builds a special error type called ``ErrorList``, 93306f32e7eSjoergwhich holds a list of user defined errors. The ``handleErrors`` routine 93406f32e7eSjoergrecognizes this type and will attempt to handle each of the contained errors in 93506f32e7eSjoergorder. If all contained errors can be handled, ``handleErrors`` will return 93606f32e7eSjoerg``Error::success()``, otherwise ``handleErrors`` will concatenate the remaining 93706f32e7eSjoergerrors and return the resulting ``ErrorList``. 93806f32e7eSjoerg 93906f32e7eSjoergBuilding fallible iterators and iterator ranges 94006f32e7eSjoerg""""""""""""""""""""""""""""""""""""""""""""""" 94106f32e7eSjoerg 94206f32e7eSjoergThe archive walking examples above retrieve archive members by index, however 94306f32e7eSjoergthis requires considerable boiler-plate for iteration and error checking. We can 94406f32e7eSjoergclean this up by using the "fallible iterator" pattern, which supports the 94506f32e7eSjoergfollowing natural iteration idiom for fallible containers like Archive: 94606f32e7eSjoerg 94706f32e7eSjoerg.. code-block:: c++ 94806f32e7eSjoerg 949*da58b97aSjoerg Error Err = Error::success(); 95006f32e7eSjoerg for (auto &Child : Ar->children(Err)) { 95106f32e7eSjoerg // Use Child - only enter the loop when it's valid 95206f32e7eSjoerg 95306f32e7eSjoerg // Allow early exit from the loop body, since we know that Err is success 95406f32e7eSjoerg // when we're inside the loop. 95506f32e7eSjoerg if (BailOutOn(Child)) 95606f32e7eSjoerg return; 95706f32e7eSjoerg 95806f32e7eSjoerg ... 95906f32e7eSjoerg } 96006f32e7eSjoerg // Check Err after the loop to ensure it didn't break due to an error. 96106f32e7eSjoerg if (Err) 96206f32e7eSjoerg return Err; 96306f32e7eSjoerg 96406f32e7eSjoergTo enable this idiom, iterators over fallible containers are written in a 96506f32e7eSjoergnatural style, with their ``++`` and ``--`` operators replaced with fallible 96606f32e7eSjoerg``Error inc()`` and ``Error dec()`` functions. E.g.: 96706f32e7eSjoerg 96806f32e7eSjoerg.. code-block:: c++ 96906f32e7eSjoerg 97006f32e7eSjoerg class FallibleChildIterator { 97106f32e7eSjoerg public: 97206f32e7eSjoerg FallibleChildIterator(Archive &A, unsigned ChildIdx); 97306f32e7eSjoerg Archive::Child &operator*(); 97406f32e7eSjoerg friend bool operator==(const ArchiveIterator &LHS, 97506f32e7eSjoerg const ArchiveIterator &RHS); 97606f32e7eSjoerg 97706f32e7eSjoerg // operator++/operator-- replaced with fallible increment / decrement: 97806f32e7eSjoerg Error inc() { 97906f32e7eSjoerg if (!A.childValid(ChildIdx + 1)) 98006f32e7eSjoerg return make_error<BadArchiveMember>(...); 98106f32e7eSjoerg ++ChildIdx; 98206f32e7eSjoerg return Error::success(); 98306f32e7eSjoerg } 98406f32e7eSjoerg 98506f32e7eSjoerg Error dec() { ... } 98606f32e7eSjoerg }; 98706f32e7eSjoerg 98806f32e7eSjoergInstances of this kind of fallible iterator interface are then wrapped with the 98906f32e7eSjoergfallible_iterator utility which provides ``operator++`` and ``operator--``, 99006f32e7eSjoergreturning any errors via a reference passed in to the wrapper at construction 99106f32e7eSjoergtime. The fallible_iterator wrapper takes care of (a) jumping to the end of the 99206f32e7eSjoergrange on error, and (b) marking the error as checked whenever an iterator is 99306f32e7eSjoergcompared to ``end`` and found to be inequal (in particular: this marks the 99406f32e7eSjoergerror as checked throughout the body of a range-based for loop), enabling early 99506f32e7eSjoergexit from the loop without redundant error checking. 99606f32e7eSjoerg 99706f32e7eSjoergInstances of the fallible iterator interface (e.g. FallibleChildIterator above) 99806f32e7eSjoergare wrapped using the ``make_fallible_itr`` and ``make_fallible_end`` 99906f32e7eSjoergfunctions. E.g.: 100006f32e7eSjoerg 100106f32e7eSjoerg.. code-block:: c++ 100206f32e7eSjoerg 100306f32e7eSjoerg class Archive { 100406f32e7eSjoerg public: 100506f32e7eSjoerg using child_iterator = fallible_iterator<FallibleChildIterator>; 100606f32e7eSjoerg 100706f32e7eSjoerg child_iterator child_begin(Error &Err) { 100806f32e7eSjoerg return make_fallible_itr(FallibleChildIterator(*this, 0), Err); 100906f32e7eSjoerg } 101006f32e7eSjoerg 101106f32e7eSjoerg child_iterator child_end() { 101206f32e7eSjoerg return make_fallible_end(FallibleChildIterator(*this, size())); 101306f32e7eSjoerg } 101406f32e7eSjoerg 101506f32e7eSjoerg iterator_range<child_iterator> children(Error &Err) { 101606f32e7eSjoerg return make_range(child_begin(Err), child_end()); 101706f32e7eSjoerg } 101806f32e7eSjoerg }; 101906f32e7eSjoerg 102006f32e7eSjoergUsing the fallible_iterator utility allows for both natural construction of 102106f32e7eSjoergfallible iterators (using failing ``inc`` and ``dec`` operations) and 102206f32e7eSjoergrelatively natural use of c++ iterator/loop idioms. 102306f32e7eSjoerg 102406f32e7eSjoerg.. _function_apis: 102506f32e7eSjoerg 102606f32e7eSjoergMore information on Error and its related utilities can be found in the 102706f32e7eSjoergError.h header file. 102806f32e7eSjoerg 102906f32e7eSjoergPassing functions and other callable objects 103006f32e7eSjoerg-------------------------------------------- 103106f32e7eSjoerg 103206f32e7eSjoergSometimes you may want a function to be passed a callback object. In order to 103306f32e7eSjoergsupport lambda expressions and other function objects, you should not use the 103406f32e7eSjoergtraditional C approach of taking a function pointer and an opaque cookie: 103506f32e7eSjoerg 103606f32e7eSjoerg.. code-block:: c++ 103706f32e7eSjoerg 103806f32e7eSjoerg void takeCallback(bool (*Callback)(Function *, void *), void *Cookie); 103906f32e7eSjoerg 104006f32e7eSjoergInstead, use one of the following approaches: 104106f32e7eSjoerg 104206f32e7eSjoergFunction template 104306f32e7eSjoerg^^^^^^^^^^^^^^^^^ 104406f32e7eSjoerg 104506f32e7eSjoergIf you don't mind putting the definition of your function into a header file, 104606f32e7eSjoergmake it a function template that is templated on the callable type. 104706f32e7eSjoerg 104806f32e7eSjoerg.. code-block:: c++ 104906f32e7eSjoerg 105006f32e7eSjoerg template<typename Callable> 105106f32e7eSjoerg void takeCallback(Callable Callback) { 105206f32e7eSjoerg Callback(1, 2, 3); 105306f32e7eSjoerg } 105406f32e7eSjoerg 105506f32e7eSjoergThe ``function_ref`` class template 105606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 105706f32e7eSjoerg 105806f32e7eSjoergThe ``function_ref`` 1059*da58b97aSjoerg(`doxygen <https://llvm.org/doxygen/classllvm_1_1function__ref_3_01Ret_07Params_8_8_8_08_4.html>`__) class 106006f32e7eSjoergtemplate represents a reference to a callable object, templated over the type 106106f32e7eSjoergof the callable. This is a good choice for passing a callback to a function, 106206f32e7eSjoergif you don't need to hold onto the callback after the function returns. In this 106306f32e7eSjoergway, ``function_ref`` is to ``std::function`` as ``StringRef`` is to 106406f32e7eSjoerg``std::string``. 106506f32e7eSjoerg 106606f32e7eSjoerg``function_ref<Ret(Param1, Param2, ...)>`` can be implicitly constructed from 106706f32e7eSjoergany callable object that can be called with arguments of type ``Param1``, 106806f32e7eSjoerg``Param2``, ..., and returns a value that can be converted to type ``Ret``. 106906f32e7eSjoergFor example: 107006f32e7eSjoerg 107106f32e7eSjoerg.. code-block:: c++ 107206f32e7eSjoerg 107306f32e7eSjoerg void visitBasicBlocks(Function *F, function_ref<bool (BasicBlock*)> Callback) { 107406f32e7eSjoerg for (BasicBlock &BB : *F) 107506f32e7eSjoerg if (Callback(&BB)) 107606f32e7eSjoerg return; 107706f32e7eSjoerg } 107806f32e7eSjoerg 107906f32e7eSjoergcan be called using: 108006f32e7eSjoerg 108106f32e7eSjoerg.. code-block:: c++ 108206f32e7eSjoerg 108306f32e7eSjoerg visitBasicBlocks(F, [&](BasicBlock *BB) { 108406f32e7eSjoerg if (process(BB)) 108506f32e7eSjoerg return isEmpty(BB); 108606f32e7eSjoerg return false; 108706f32e7eSjoerg }); 108806f32e7eSjoerg 108906f32e7eSjoergNote that a ``function_ref`` object contains pointers to external memory, so it 109006f32e7eSjoergis not generally safe to store an instance of the class (unless you know that 109106f32e7eSjoergthe external storage will not be freed). If you need this ability, consider 109206f32e7eSjoergusing ``std::function``. ``function_ref`` is small enough that it should always 109306f32e7eSjoergbe passed by value. 109406f32e7eSjoerg 109506f32e7eSjoerg.. _DEBUG: 109606f32e7eSjoerg 109706f32e7eSjoergThe ``LLVM_DEBUG()`` macro and ``-debug`` option 109806f32e7eSjoerg------------------------------------------------ 109906f32e7eSjoerg 110006f32e7eSjoergOften when working on your pass you will put a bunch of debugging printouts and 110106f32e7eSjoergother code into your pass. After you get it working, you want to remove it, but 110206f32e7eSjoergyou may need it again in the future (to work out new bugs that you run across). 110306f32e7eSjoerg 110406f32e7eSjoergNaturally, because of this, you don't want to delete the debug printouts, but 110506f32e7eSjoergyou don't want them to always be noisy. A standard compromise is to comment 110606f32e7eSjoergthem out, allowing you to enable them if you need them in the future. 110706f32e7eSjoerg 110806f32e7eSjoergThe ``llvm/Support/Debug.h`` (`doxygen 1109*da58b97aSjoerg<https://llvm.org/doxygen/Debug_8h_source.html>`__) file provides a macro named 111006f32e7eSjoerg``LLVM_DEBUG()`` that is a much nicer solution to this problem. Basically, you can 111106f32e7eSjoergput arbitrary code into the argument of the ``LLVM_DEBUG`` macro, and it is only 111206f32e7eSjoergexecuted if '``opt``' (or any other tool) is run with the '``-debug``' command 111306f32e7eSjoergline argument: 111406f32e7eSjoerg 111506f32e7eSjoerg.. code-block:: c++ 111606f32e7eSjoerg 111706f32e7eSjoerg LLVM_DEBUG(dbgs() << "I am here!\n"); 111806f32e7eSjoerg 111906f32e7eSjoergThen you can run your pass like this: 112006f32e7eSjoerg 112106f32e7eSjoerg.. code-block:: none 112206f32e7eSjoerg 112306f32e7eSjoerg $ opt < a.bc > /dev/null -mypass 112406f32e7eSjoerg <no output> 112506f32e7eSjoerg $ opt < a.bc > /dev/null -mypass -debug 112606f32e7eSjoerg I am here! 112706f32e7eSjoerg 112806f32e7eSjoergUsing the ``LLVM_DEBUG()`` macro instead of a home-brewed solution allows you to not 112906f32e7eSjoerghave to create "yet another" command line option for the debug output for your 113006f32e7eSjoergpass. Note that ``LLVM_DEBUG()`` macros are disabled for non-asserts builds, so they 113106f32e7eSjoergdo not cause a performance impact at all (for the same reason, they should also 113206f32e7eSjoergnot contain side-effects!). 113306f32e7eSjoerg 113406f32e7eSjoergOne additional nice thing about the ``LLVM_DEBUG()`` macro is that you can enable or 113506f32e7eSjoergdisable it directly in gdb. Just use "``set DebugFlag=0``" or "``set 113606f32e7eSjoergDebugFlag=1``" from the gdb if the program is running. If the program hasn't 113706f32e7eSjoergbeen started yet, you can always just run it with ``-debug``. 113806f32e7eSjoerg 113906f32e7eSjoerg.. _DEBUG_TYPE: 114006f32e7eSjoerg 114106f32e7eSjoergFine grained debug info with ``DEBUG_TYPE`` and the ``-debug-only`` option 114206f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 114306f32e7eSjoerg 114406f32e7eSjoergSometimes you may find yourself in a situation where enabling ``-debug`` just 114506f32e7eSjoergturns on **too much** information (such as when working on the code generator). 114606f32e7eSjoergIf you want to enable debug information with more fine-grained control, you 114706f32e7eSjoergshould define the ``DEBUG_TYPE`` macro and use the ``-debug-only`` option as 114806f32e7eSjoergfollows: 114906f32e7eSjoerg 115006f32e7eSjoerg.. code-block:: c++ 115106f32e7eSjoerg 115206f32e7eSjoerg #define DEBUG_TYPE "foo" 115306f32e7eSjoerg LLVM_DEBUG(dbgs() << "'foo' debug type\n"); 115406f32e7eSjoerg #undef DEBUG_TYPE 115506f32e7eSjoerg #define DEBUG_TYPE "bar" 115606f32e7eSjoerg LLVM_DEBUG(dbgs() << "'bar' debug type\n"); 115706f32e7eSjoerg #undef DEBUG_TYPE 115806f32e7eSjoerg 115906f32e7eSjoergThen you can run your pass like this: 116006f32e7eSjoerg 116106f32e7eSjoerg.. code-block:: none 116206f32e7eSjoerg 116306f32e7eSjoerg $ opt < a.bc > /dev/null -mypass 116406f32e7eSjoerg <no output> 116506f32e7eSjoerg $ opt < a.bc > /dev/null -mypass -debug 116606f32e7eSjoerg 'foo' debug type 116706f32e7eSjoerg 'bar' debug type 116806f32e7eSjoerg $ opt < a.bc > /dev/null -mypass -debug-only=foo 116906f32e7eSjoerg 'foo' debug type 117006f32e7eSjoerg $ opt < a.bc > /dev/null -mypass -debug-only=bar 117106f32e7eSjoerg 'bar' debug type 117206f32e7eSjoerg $ opt < a.bc > /dev/null -mypass -debug-only=foo,bar 117306f32e7eSjoerg 'foo' debug type 117406f32e7eSjoerg 'bar' debug type 117506f32e7eSjoerg 117606f32e7eSjoergOf course, in practice, you should only set ``DEBUG_TYPE`` at the top of a file, 117706f32e7eSjoergto specify the debug type for the entire module. Be careful that you only do 117806f32e7eSjoergthis after including Debug.h and not around any #include of headers. Also, you 117906f32e7eSjoergshould use names more meaningful than "foo" and "bar", because there is no 118006f32e7eSjoergsystem in place to ensure that names do not conflict. If two different modules 118106f32e7eSjoerguse the same string, they will all be turned on when the name is specified. 118206f32e7eSjoergThis allows, for example, all debug information for instruction scheduling to be 118306f32e7eSjoergenabled with ``-debug-only=InstrSched``, even if the source lives in multiple 118406f32e7eSjoergfiles. The name must not include a comma (,) as that is used to separate the 118506f32e7eSjoergarguments of the ``-debug-only`` option. 118606f32e7eSjoerg 118706f32e7eSjoergFor performance reasons, -debug-only is not available in optimized build 118806f32e7eSjoerg(``--enable-optimized``) of LLVM. 118906f32e7eSjoerg 119006f32e7eSjoergThe ``DEBUG_WITH_TYPE`` macro is also available for situations where you would 119106f32e7eSjoerglike to set ``DEBUG_TYPE``, but only for one specific ``DEBUG`` statement. It 119206f32e7eSjoergtakes an additional first parameter, which is the type to use. For example, the 119306f32e7eSjoergpreceding example could be written as: 119406f32e7eSjoerg 119506f32e7eSjoerg.. code-block:: c++ 119606f32e7eSjoerg 119706f32e7eSjoerg DEBUG_WITH_TYPE("foo", dbgs() << "'foo' debug type\n"); 119806f32e7eSjoerg DEBUG_WITH_TYPE("bar", dbgs() << "'bar' debug type\n"); 119906f32e7eSjoerg 120006f32e7eSjoerg.. _Statistic: 120106f32e7eSjoerg 120206f32e7eSjoergThe ``Statistic`` class & ``-stats`` option 120306f32e7eSjoerg------------------------------------------- 120406f32e7eSjoerg 120506f32e7eSjoergThe ``llvm/ADT/Statistic.h`` (`doxygen 1206*da58b97aSjoerg<https://llvm.org/doxygen/Statistic_8h_source.html>`__) file provides a class 120706f32e7eSjoergnamed ``Statistic`` that is used as a unified way to keep track of what the LLVM 120806f32e7eSjoergcompiler is doing and how effective various optimizations are. It is useful to 120906f32e7eSjoergsee what optimizations are contributing to making a particular program run 121006f32e7eSjoergfaster. 121106f32e7eSjoerg 121206f32e7eSjoergOften you may run your pass on some big program, and you're interested to see 121306f32e7eSjoerghow many times it makes a certain transformation. Although you can do this with 121406f32e7eSjoerghand inspection, or some ad-hoc method, this is a real pain and not very useful 121506f32e7eSjoergfor big programs. Using the ``Statistic`` class makes it very easy to keep 121606f32e7eSjoergtrack of this information, and the calculated information is presented in a 121706f32e7eSjoerguniform manner with the rest of the passes being executed. 121806f32e7eSjoerg 121906f32e7eSjoergThere are many examples of ``Statistic`` uses, but the basics of using it are as 122006f32e7eSjoergfollows: 122106f32e7eSjoerg 122206f32e7eSjoergDefine your statistic like this: 122306f32e7eSjoerg 122406f32e7eSjoerg.. code-block:: c++ 122506f32e7eSjoerg 122606f32e7eSjoerg #define DEBUG_TYPE "mypassname" // This goes before any #includes. 122706f32e7eSjoerg STATISTIC(NumXForms, "The # of times I did stuff"); 122806f32e7eSjoerg 122906f32e7eSjoergThe ``STATISTIC`` macro defines a static variable, whose name is specified by 123006f32e7eSjoergthe first argument. The pass name is taken from the ``DEBUG_TYPE`` macro, and 123106f32e7eSjoergthe description is taken from the second argument. The variable defined 123206f32e7eSjoerg("NumXForms" in this case) acts like an unsigned integer. 123306f32e7eSjoerg 123406f32e7eSjoergWhenever you make a transformation, bump the counter: 123506f32e7eSjoerg 123606f32e7eSjoerg.. code-block:: c++ 123706f32e7eSjoerg 123806f32e7eSjoerg ++NumXForms; // I did stuff! 123906f32e7eSjoerg 124006f32e7eSjoergThat's all you have to do. To get '``opt``' to print out the statistics 124106f32e7eSjoerggathered, use the '``-stats``' option: 124206f32e7eSjoerg 124306f32e7eSjoerg.. code-block:: none 124406f32e7eSjoerg 124506f32e7eSjoerg $ opt -stats -mypassname < program.bc > /dev/null 124606f32e7eSjoerg ... statistics output ... 124706f32e7eSjoerg 124806f32e7eSjoergNote that in order to use the '``-stats``' option, LLVM must be 124906f32e7eSjoergcompiled with assertions enabled. 125006f32e7eSjoerg 125106f32e7eSjoergWhen running ``opt`` on a C file from the SPEC benchmark suite, it gives a 125206f32e7eSjoergreport that looks like this: 125306f32e7eSjoerg 125406f32e7eSjoerg.. code-block:: none 125506f32e7eSjoerg 125606f32e7eSjoerg 7646 bitcodewriter - Number of normal instructions 125706f32e7eSjoerg 725 bitcodewriter - Number of oversized instructions 125806f32e7eSjoerg 129996 bitcodewriter - Number of bitcode bytes written 125906f32e7eSjoerg 2817 raise - Number of insts DCEd or constprop'd 126006f32e7eSjoerg 3213 raise - Number of cast-of-self removed 126106f32e7eSjoerg 5046 raise - Number of expression trees converted 126206f32e7eSjoerg 75 raise - Number of other getelementptr's formed 126306f32e7eSjoerg 138 raise - Number of load/store peepholes 126406f32e7eSjoerg 42 deadtypeelim - Number of unused typenames removed from symtab 126506f32e7eSjoerg 392 funcresolve - Number of varargs functions resolved 126606f32e7eSjoerg 27 globaldce - Number of global variables removed 126706f32e7eSjoerg 2 adce - Number of basic blocks removed 126806f32e7eSjoerg 134 cee - Number of branches revectored 126906f32e7eSjoerg 49 cee - Number of setcc instruction eliminated 127006f32e7eSjoerg 532 gcse - Number of loads removed 127106f32e7eSjoerg 2919 gcse - Number of instructions removed 127206f32e7eSjoerg 86 indvars - Number of canonical indvars added 127306f32e7eSjoerg 87 indvars - Number of aux indvars removed 127406f32e7eSjoerg 25 instcombine - Number of dead inst eliminate 127506f32e7eSjoerg 434 instcombine - Number of insts combined 127606f32e7eSjoerg 248 licm - Number of load insts hoisted 127706f32e7eSjoerg 1298 licm - Number of insts hoisted to a loop pre-header 127806f32e7eSjoerg 3 licm - Number of insts hoisted to multiple loop preds (bad, no loop pre-header) 127906f32e7eSjoerg 75 mem2reg - Number of alloca's promoted 128006f32e7eSjoerg 1444 cfgsimplify - Number of blocks simplified 128106f32e7eSjoerg 128206f32e7eSjoergObviously, with so many optimizations, having a unified framework for this stuff 128306f32e7eSjoergis very nice. Making your pass fit well into the framework makes it more 128406f32e7eSjoergmaintainable and useful. 128506f32e7eSjoerg 128606f32e7eSjoerg.. _DebugCounters: 128706f32e7eSjoerg 128806f32e7eSjoergAdding debug counters to aid in debugging your code 128906f32e7eSjoerg--------------------------------------------------- 129006f32e7eSjoerg 129106f32e7eSjoergSometimes, when writing new passes, or trying to track down bugs, it 129206f32e7eSjoergis useful to be able to control whether certain things in your pass 129306f32e7eSjoerghappen or not. For example, there are times the minimization tooling 129406f32e7eSjoergcan only easily give you large testcases. You would like to narrow 129506f32e7eSjoergyour bug down to a specific transformation happening or not happening, 129606f32e7eSjoergautomatically, using bisection. This is where debug counters help. 129706f32e7eSjoergThey provide a framework for making parts of your code only execute a 129806f32e7eSjoergcertain number of times. 129906f32e7eSjoerg 130006f32e7eSjoergThe ``llvm/Support/DebugCounter.h`` (`doxygen 1301*da58b97aSjoerg<https://llvm.org/doxygen/DebugCounter_8h_source.html>`__) file 130206f32e7eSjoergprovides a class named ``DebugCounter`` that can be used to create 130306f32e7eSjoergcommand line counter options that control execution of parts of your code. 130406f32e7eSjoerg 130506f32e7eSjoergDefine your DebugCounter like this: 130606f32e7eSjoerg 130706f32e7eSjoerg.. code-block:: c++ 130806f32e7eSjoerg 130906f32e7eSjoerg DEBUG_COUNTER(DeleteAnInstruction, "passname-delete-instruction", 131006f32e7eSjoerg "Controls which instructions get delete"); 131106f32e7eSjoerg 131206f32e7eSjoergThe ``DEBUG_COUNTER`` macro defines a static variable, whose name 131306f32e7eSjoergis specified by the first argument. The name of the counter 131406f32e7eSjoerg(which is used on the command line) is specified by the second 131506f32e7eSjoergargument, and the description used in the help is specified by the 131606f32e7eSjoergthird argument. 131706f32e7eSjoerg 131806f32e7eSjoergWhatever code you want that control, use ``DebugCounter::shouldExecute`` to control it. 131906f32e7eSjoerg 132006f32e7eSjoerg.. code-block:: c++ 132106f32e7eSjoerg 132206f32e7eSjoerg if (DebugCounter::shouldExecute(DeleteAnInstruction)) 132306f32e7eSjoerg I->eraseFromParent(); 132406f32e7eSjoerg 132506f32e7eSjoergThat's all you have to do. Now, using opt, you can control when this code triggers using 132606f32e7eSjoergthe '``--debug-counter``' option. There are two counters provided, ``skip`` and ``count``. 132706f32e7eSjoerg``skip`` is the number of times to skip execution of the codepath. ``count`` is the number 132806f32e7eSjoergof times, once we are done skipping, to execute the codepath. 132906f32e7eSjoerg 133006f32e7eSjoerg.. code-block:: none 133106f32e7eSjoerg 133206f32e7eSjoerg $ opt --debug-counter=passname-delete-instruction-skip=1,passname-delete-instruction-count=2 -passname 133306f32e7eSjoerg 133406f32e7eSjoergThis will skip the above code the first time we hit it, then execute it twice, then skip the rest of the executions. 133506f32e7eSjoerg 133606f32e7eSjoergSo if executed on the following code: 133706f32e7eSjoerg 133806f32e7eSjoerg.. code-block:: llvm 133906f32e7eSjoerg 134006f32e7eSjoerg %1 = add i32 %a, %b 134106f32e7eSjoerg %2 = add i32 %a, %b 134206f32e7eSjoerg %3 = add i32 %a, %b 134306f32e7eSjoerg %4 = add i32 %a, %b 134406f32e7eSjoerg 134506f32e7eSjoergIt would delete number ``%2`` and ``%3``. 134606f32e7eSjoerg 134706f32e7eSjoergA utility is provided in `utils/bisect-skip-count` to binary search 134806f32e7eSjoergskip and count arguments. It can be used to automatically minimize the 134906f32e7eSjoergskip and count for a debug-counter variable. 135006f32e7eSjoerg 135106f32e7eSjoerg.. _ViewGraph: 135206f32e7eSjoerg 135306f32e7eSjoergViewing graphs while debugging code 135406f32e7eSjoerg----------------------------------- 135506f32e7eSjoerg 135606f32e7eSjoergSeveral of the important data structures in LLVM are graphs: for example CFGs 135706f32e7eSjoergmade out of LLVM :ref:`BasicBlocks <BasicBlock>`, CFGs made out of LLVM 135806f32e7eSjoerg:ref:`MachineBasicBlocks <MachineBasicBlock>`, and :ref:`Instruction Selection 135906f32e7eSjoergDAGs <SelectionDAG>`. In many cases, while debugging various parts of the 136006f32e7eSjoergcompiler, it is nice to instantly visualize these graphs. 136106f32e7eSjoerg 136206f32e7eSjoergLLVM provides several callbacks that are available in a debug build to do 136306f32e7eSjoergexactly that. If you call the ``Function::viewCFG()`` method, for example, the 136406f32e7eSjoergcurrent LLVM tool will pop up a window containing the CFG for the function where 136506f32e7eSjoergeach basic block is a node in the graph, and each node contains the instructions 136606f32e7eSjoergin the block. Similarly, there also exists ``Function::viewCFGOnly()`` (does 136706f32e7eSjoergnot include the instructions), the ``MachineFunction::viewCFG()`` and 136806f32e7eSjoerg``MachineFunction::viewCFGOnly()``, and the ``SelectionDAG::viewGraph()`` 136906f32e7eSjoergmethods. Within GDB, for example, you can usually use something like ``call 137006f32e7eSjoergDAG.viewGraph()`` to pop up a window. Alternatively, you can sprinkle calls to 137106f32e7eSjoergthese functions in your code in places you want to debug. 137206f32e7eSjoerg 137306f32e7eSjoergGetting this to work requires a small amount of setup. On Unix systems 137406f32e7eSjoergwith X11, install the `graphviz <http://www.graphviz.org>`_ toolkit, and make 137506f32e7eSjoergsure 'dot' and 'gv' are in your path. If you are running on macOS, download 137606f32e7eSjoergand install the macOS `Graphviz program 137706f32e7eSjoerg<http://www.pixelglow.com/graphviz/>`_ and add 137806f32e7eSjoerg``/Applications/Graphviz.app/Contents/MacOS/`` (or wherever you install it) to 137906f32e7eSjoergyour path. The programs need not be present when configuring, building or 138006f32e7eSjoergrunning LLVM and can simply be installed when needed during an active debug 138106f32e7eSjoergsession. 138206f32e7eSjoerg 138306f32e7eSjoerg``SelectionDAG`` has been extended to make it easier to locate *interesting* 138406f32e7eSjoergnodes in large complex graphs. From gdb, if you ``call DAG.setGraphColor(node, 138506f32e7eSjoerg"color")``, then the next ``call DAG.viewGraph()`` would highlight the node in 138606f32e7eSjoergthe specified color (choices of colors can be found at `colors 138706f32e7eSjoerg<http://www.graphviz.org/doc/info/colors.html>`_.) More complex node attributes 138806f32e7eSjoergcan be provided with ``call DAG.setGraphAttrs(node, "attributes")`` (choices can 138906f32e7eSjoergbe found at `Graph attributes <http://www.graphviz.org/doc/info/attrs.html>`_.) 139006f32e7eSjoergIf you want to restart and clear all the current graph attributes, then you can 139106f32e7eSjoerg``call DAG.clearGraphAttrs()``. 139206f32e7eSjoerg 139306f32e7eSjoergNote that graph visualization features are compiled out of Release builds to 139406f32e7eSjoergreduce file size. This means that you need a Debug+Asserts or Release+Asserts 139506f32e7eSjoergbuild to use these features. 139606f32e7eSjoerg 139706f32e7eSjoerg.. _datastructure: 139806f32e7eSjoerg 139906f32e7eSjoergPicking the Right Data Structure for a Task 140006f32e7eSjoerg=========================================== 140106f32e7eSjoerg 140206f32e7eSjoergLLVM has a plethora of data structures in the ``llvm/ADT/`` directory, and we 140306f32e7eSjoergcommonly use STL data structures. This section describes the trade-offs you 140406f32e7eSjoergshould consider when you pick one. 140506f32e7eSjoerg 140606f32e7eSjoergThe first step is a choose your own adventure: do you want a sequential 140706f32e7eSjoergcontainer, a set-like container, or a map-like container? The most important 140806f32e7eSjoergthing when choosing a container is the algorithmic properties of how you plan to 140906f32e7eSjoergaccess the container. Based on that, you should use: 141006f32e7eSjoerg 141106f32e7eSjoerg 141206f32e7eSjoerg* a :ref:`map-like <ds_map>` container if you need efficient look-up of a 141306f32e7eSjoerg value based on another value. Map-like containers also support efficient 141406f32e7eSjoerg queries for containment (whether a key is in the map). Map-like containers 141506f32e7eSjoerg generally do not support efficient reverse mapping (values to keys). If you 141606f32e7eSjoerg need that, use two maps. Some map-like containers also support efficient 141706f32e7eSjoerg iteration through the keys in sorted order. Map-like containers are the most 141806f32e7eSjoerg expensive sort, only use them if you need one of these capabilities. 141906f32e7eSjoerg 142006f32e7eSjoerg* a :ref:`set-like <ds_set>` container if you need to put a bunch of stuff into 142106f32e7eSjoerg a container that automatically eliminates duplicates. Some set-like 142206f32e7eSjoerg containers support efficient iteration through the elements in sorted order. 142306f32e7eSjoerg Set-like containers are more expensive than sequential containers. 142406f32e7eSjoerg 142506f32e7eSjoerg* a :ref:`sequential <ds_sequential>` container provides the most efficient way 142606f32e7eSjoerg to add elements and keeps track of the order they are added to the collection. 142706f32e7eSjoerg They permit duplicates and support efficient iteration, but do not support 142806f32e7eSjoerg efficient look-up based on a key. 142906f32e7eSjoerg 143006f32e7eSjoerg* a :ref:`string <ds_string>` container is a specialized sequential container or 143106f32e7eSjoerg reference structure that is used for character or byte arrays. 143206f32e7eSjoerg 143306f32e7eSjoerg* a :ref:`bit <ds_bit>` container provides an efficient way to store and 143406f32e7eSjoerg perform set operations on sets of numeric id's, while automatically 143506f32e7eSjoerg eliminating duplicates. Bit containers require a maximum of 1 bit for each 143606f32e7eSjoerg identifier you want to store. 143706f32e7eSjoerg 143806f32e7eSjoergOnce the proper category of container is determined, you can fine tune the 143906f32e7eSjoergmemory use, constant factors, and cache behaviors of access by intelligently 144006f32e7eSjoergpicking a member of the category. Note that constant factors and cache behavior 144106f32e7eSjoergcan be a big deal. If you have a vector that usually only contains a few 144206f32e7eSjoergelements (but could contain many), for example, it's much better to use 144306f32e7eSjoerg:ref:`SmallVector <dss_smallvector>` than :ref:`vector <dss_vector>`. Doing so 144406f32e7eSjoergavoids (relatively) expensive malloc/free calls, which dwarf the cost of adding 144506f32e7eSjoergthe elements to the container. 144606f32e7eSjoerg 144706f32e7eSjoerg.. _ds_sequential: 144806f32e7eSjoerg 144906f32e7eSjoergSequential Containers (std::vector, std::list, etc) 145006f32e7eSjoerg--------------------------------------------------- 145106f32e7eSjoerg 145206f32e7eSjoergThere are a variety of sequential containers available for you, based on your 145306f32e7eSjoergneeds. Pick the first in this section that will do what you want. 145406f32e7eSjoerg 145506f32e7eSjoerg.. _dss_arrayref: 145606f32e7eSjoerg 145706f32e7eSjoergllvm/ADT/ArrayRef.h 145806f32e7eSjoerg^^^^^^^^^^^^^^^^^^^ 145906f32e7eSjoerg 146006f32e7eSjoergThe ``llvm::ArrayRef`` class is the preferred class to use in an interface that 146106f32e7eSjoergaccepts a sequential list of elements in memory and just reads from them. By 146206f32e7eSjoergtaking an ``ArrayRef``, the API can be passed a fixed size array, an 146306f32e7eSjoerg``std::vector``, an ``llvm::SmallVector`` and anything else that is contiguous 146406f32e7eSjoergin memory. 146506f32e7eSjoerg 146606f32e7eSjoerg.. _dss_fixedarrays: 146706f32e7eSjoerg 146806f32e7eSjoergFixed Size Arrays 146906f32e7eSjoerg^^^^^^^^^^^^^^^^^ 147006f32e7eSjoerg 147106f32e7eSjoergFixed size arrays are very simple and very fast. They are good if you know 147206f32e7eSjoergexactly how many elements you have, or you have a (low) upper bound on how many 147306f32e7eSjoergyou have. 147406f32e7eSjoerg 147506f32e7eSjoerg.. _dss_heaparrays: 147606f32e7eSjoerg 147706f32e7eSjoergHeap Allocated Arrays 147806f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^ 147906f32e7eSjoerg 148006f32e7eSjoergHeap allocated arrays (``new[]`` + ``delete[]``) are also simple. They are good 148106f32e7eSjoergif the number of elements is variable, if you know how many elements you will 148206f32e7eSjoergneed before the array is allocated, and if the array is usually large (if not, 148306f32e7eSjoergconsider a :ref:`SmallVector <dss_smallvector>`). The cost of a heap allocated 148406f32e7eSjoergarray is the cost of the new/delete (aka malloc/free). Also note that if you 148506f32e7eSjoergare allocating an array of a type with a constructor, the constructor and 148606f32e7eSjoergdestructors will be run for every element in the array (re-sizable vectors only 148706f32e7eSjoergconstruct those elements actually used). 148806f32e7eSjoerg 148906f32e7eSjoerg.. _dss_tinyptrvector: 149006f32e7eSjoerg 149106f32e7eSjoergllvm/ADT/TinyPtrVector.h 149206f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^ 149306f32e7eSjoerg 149406f32e7eSjoerg``TinyPtrVector<Type>`` is a highly specialized collection class that is 149506f32e7eSjoergoptimized to avoid allocation in the case when a vector has zero or one 149606f32e7eSjoergelements. It has two major restrictions: 1) it can only hold values of pointer 149706f32e7eSjoergtype, and 2) it cannot hold a null pointer. 149806f32e7eSjoerg 149906f32e7eSjoergSince this container is highly specialized, it is rarely used. 150006f32e7eSjoerg 150106f32e7eSjoerg.. _dss_smallvector: 150206f32e7eSjoerg 150306f32e7eSjoergllvm/ADT/SmallVector.h 150406f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^ 150506f32e7eSjoerg 150606f32e7eSjoerg``SmallVector<Type, N>`` is a simple class that looks and smells just like 150706f32e7eSjoerg``vector<Type>``: it supports efficient iteration, lays out elements in memory 150806f32e7eSjoergorder (so you can do pointer arithmetic between elements), supports efficient 150906f32e7eSjoergpush_back/pop_back operations, supports efficient random access to its elements, 151006f32e7eSjoergetc. 151106f32e7eSjoerg 151206f32e7eSjoergThe main advantage of SmallVector is that it allocates space for some number of 151306f32e7eSjoergelements (N) **in the object itself**. Because of this, if the SmallVector is 151406f32e7eSjoergdynamically smaller than N, no malloc is performed. This can be a big win in 151506f32e7eSjoergcases where the malloc/free call is far more expensive than the code that 151606f32e7eSjoergfiddles around with the elements. 151706f32e7eSjoerg 151806f32e7eSjoergThis is good for vectors that are "usually small" (e.g. the number of 151906f32e7eSjoergpredecessors/successors of a block is usually less than 8). On the other hand, 152006f32e7eSjoergthis makes the size of the SmallVector itself large, so you don't want to 152106f32e7eSjoergallocate lots of them (doing so will waste a lot of space). As such, 152206f32e7eSjoergSmallVectors are most useful when on the stack. 152306f32e7eSjoerg 1524*da58b97aSjoergIn the absence of a well-motivated choice for the number of 1525*da58b97aSjoerginlined elements ``N``, it is recommended to use ``SmallVector<T>`` (that is, 1526*da58b97aSjoergomitting the ``N``). This will choose a default number of 1527*da58b97aSjoerginlined elements reasonable for allocation on the stack (for example, trying 1528*da58b97aSjoergto keep ``sizeof(SmallVector<T>)`` around 64 bytes). 1529*da58b97aSjoerg 153006f32e7eSjoergSmallVector also provides a nice portable and efficient replacement for 153106f32e7eSjoerg``alloca``. 153206f32e7eSjoerg 153306f32e7eSjoergSmallVector has grown a few other minor advantages over std::vector, causing 153406f32e7eSjoerg``SmallVector<Type, 0>`` to be preferred over ``std::vector<Type>``. 153506f32e7eSjoerg 153606f32e7eSjoerg#. std::vector is exception-safe, and some implementations have pessimizations 153706f32e7eSjoerg that copy elements when SmallVector would move them. 153806f32e7eSjoerg 1539*da58b97aSjoerg#. SmallVector understands ``std::is_trivially_copyable<Type>`` and uses realloc aggressively. 154006f32e7eSjoerg 154106f32e7eSjoerg#. Many LLVM APIs take a SmallVectorImpl as an out parameter (see the note 154206f32e7eSjoerg below). 154306f32e7eSjoerg 154406f32e7eSjoerg#. SmallVector with N equal to 0 is smaller than std::vector on 64-bit 154506f32e7eSjoerg platforms, since it uses ``unsigned`` (instead of ``void*``) for its size 154606f32e7eSjoerg and capacity. 154706f32e7eSjoerg 154806f32e7eSjoerg.. note:: 154906f32e7eSjoerg 1550*da58b97aSjoerg Prefer to use ``ArrayRef<T>`` or ``SmallVectorImpl<T>`` as a parameter type. 155106f32e7eSjoerg 1552*da58b97aSjoerg It's rarely appropriate to use ``SmallVector<T, N>`` as a parameter type. 1553*da58b97aSjoerg If an API only reads from the vector, it should use :ref:`ArrayRef 1554*da58b97aSjoerg <dss_arrayref>`. Even if an API updates the vector the "small size" is 1555*da58b97aSjoerg unlikely to be relevant; such an API should use the ``SmallVectorImpl<T>`` 1556*da58b97aSjoerg class, which is the "vector header" (and methods) without the elements 1557*da58b97aSjoerg allocated after it. Note that ``SmallVector<T, N>`` inherits from 1558*da58b97aSjoerg ``SmallVectorImpl<T>`` so the conversion is implicit and costs nothing. E.g. 155906f32e7eSjoerg 156006f32e7eSjoerg .. code-block:: c++ 156106f32e7eSjoerg 1562*da58b97aSjoerg // DISCOURAGED: Clients cannot pass e.g. raw arrays. 1563*da58b97aSjoerg hardcodedContiguousStorage(const SmallVectorImpl<Foo> &In); 1564*da58b97aSjoerg // ENCOURAGED: Clients can pass any contiguous storage of Foo. 1565*da58b97aSjoerg allowsAnyContiguousStorage(ArrayRef<Foo> In); 1566*da58b97aSjoerg 1567*da58b97aSjoerg void someFunc1() { 1568*da58b97aSjoerg Foo Vec[] = { /* ... */ }; 1569*da58b97aSjoerg hardcodedContiguousStorage(Vec); // Error. 1570*da58b97aSjoerg allowsAnyContiguousStorage(Vec); // Works. 1571*da58b97aSjoerg } 1572*da58b97aSjoerg 1573*da58b97aSjoerg // DISCOURAGED: Clients cannot pass e.g. SmallVector<Foo, 8>. 157406f32e7eSjoerg hardcodedSmallSize(SmallVector<Foo, 2> &Out); 1575*da58b97aSjoerg // ENCOURAGED: Clients can pass any SmallVector<Foo, N>. 157606f32e7eSjoerg allowsAnySmallSize(SmallVectorImpl<Foo> &Out); 157706f32e7eSjoerg 1578*da58b97aSjoerg void someFunc2() { 157906f32e7eSjoerg SmallVector<Foo, 8> Vec; 158006f32e7eSjoerg hardcodedSmallSize(Vec); // Error. 158106f32e7eSjoerg allowsAnySmallSize(Vec); // Works. 158206f32e7eSjoerg } 158306f32e7eSjoerg 1584*da58b97aSjoerg Even though it has "``Impl``" in the name, SmallVectorImpl is widely used 1585*da58b97aSjoerg and is no longer "private to the implementation". A name like 1586*da58b97aSjoerg ``SmallVectorHeader`` might be more appropriate. 158706f32e7eSjoerg 158806f32e7eSjoerg.. _dss_vector: 158906f32e7eSjoerg 159006f32e7eSjoerg<vector> 159106f32e7eSjoerg^^^^^^^^ 159206f32e7eSjoerg 159306f32e7eSjoerg``std::vector<T>`` is well loved and respected. However, ``SmallVector<T, 0>`` 159406f32e7eSjoergis often a better option due to the advantages listed above. std::vector is 159506f32e7eSjoergstill useful when you need to store more than ``UINT32_MAX`` elements or when 159606f32e7eSjoerginterfacing with code that expects vectors :). 159706f32e7eSjoerg 159806f32e7eSjoergOne worthwhile note about std::vector: avoid code like this: 159906f32e7eSjoerg 160006f32e7eSjoerg.. code-block:: c++ 160106f32e7eSjoerg 160206f32e7eSjoerg for ( ... ) { 160306f32e7eSjoerg std::vector<foo> V; 160406f32e7eSjoerg // make use of V. 160506f32e7eSjoerg } 160606f32e7eSjoerg 160706f32e7eSjoergInstead, write this as: 160806f32e7eSjoerg 160906f32e7eSjoerg.. code-block:: c++ 161006f32e7eSjoerg 161106f32e7eSjoerg std::vector<foo> V; 161206f32e7eSjoerg for ( ... ) { 161306f32e7eSjoerg // make use of V. 161406f32e7eSjoerg V.clear(); 161506f32e7eSjoerg } 161606f32e7eSjoerg 161706f32e7eSjoergDoing so will save (at least) one heap allocation and free per iteration of the 161806f32e7eSjoergloop. 161906f32e7eSjoerg 162006f32e7eSjoerg.. _dss_deque: 162106f32e7eSjoerg 162206f32e7eSjoerg<deque> 162306f32e7eSjoerg^^^^^^^ 162406f32e7eSjoerg 162506f32e7eSjoerg``std::deque`` is, in some senses, a generalized version of ``std::vector``. 162606f32e7eSjoergLike ``std::vector``, it provides constant time random access and other similar 162706f32e7eSjoergproperties, but it also provides efficient access to the front of the list. It 162806f32e7eSjoergdoes not guarantee continuity of elements within memory. 162906f32e7eSjoerg 163006f32e7eSjoergIn exchange for this extra flexibility, ``std::deque`` has significantly higher 163106f32e7eSjoergconstant factor costs than ``std::vector``. If possible, use ``std::vector`` or 163206f32e7eSjoergsomething cheaper. 163306f32e7eSjoerg 163406f32e7eSjoerg.. _dss_list: 163506f32e7eSjoerg 163606f32e7eSjoerg<list> 163706f32e7eSjoerg^^^^^^ 163806f32e7eSjoerg 163906f32e7eSjoerg``std::list`` is an extremely inefficient class that is rarely useful. It 164006f32e7eSjoergperforms a heap allocation for every element inserted into it, thus having an 164106f32e7eSjoergextremely high constant factor, particularly for small data types. 164206f32e7eSjoerg``std::list`` also only supports bidirectional iteration, not random access 164306f32e7eSjoergiteration. 164406f32e7eSjoerg 164506f32e7eSjoergIn exchange for this high cost, std::list supports efficient access to both ends 164606f32e7eSjoergof the list (like ``std::deque``, but unlike ``std::vector`` or 164706f32e7eSjoerg``SmallVector``). In addition, the iterator invalidation characteristics of 164806f32e7eSjoergstd::list are stronger than that of a vector class: inserting or removing an 164906f32e7eSjoergelement into the list does not invalidate iterator or pointers to other elements 165006f32e7eSjoergin the list. 165106f32e7eSjoerg 165206f32e7eSjoerg.. _dss_ilist: 165306f32e7eSjoerg 165406f32e7eSjoergllvm/ADT/ilist.h 165506f32e7eSjoerg^^^^^^^^^^^^^^^^ 165606f32e7eSjoerg 165706f32e7eSjoerg``ilist<T>`` implements an 'intrusive' doubly-linked list. It is intrusive, 165806f32e7eSjoergbecause it requires the element to store and provide access to the prev/next 165906f32e7eSjoergpointers for the list. 166006f32e7eSjoerg 166106f32e7eSjoerg``ilist`` has the same drawbacks as ``std::list``, and additionally requires an 166206f32e7eSjoerg``ilist_traits`` implementation for the element type, but it provides some novel 166306f32e7eSjoergcharacteristics. In particular, it can efficiently store polymorphic objects, 166406f32e7eSjoergthe traits class is informed when an element is inserted or removed from the 166506f32e7eSjoerglist, and ``ilist``\ s are guaranteed to support a constant-time splice 166606f32e7eSjoergoperation. 166706f32e7eSjoerg 166806f32e7eSjoergThese properties are exactly what we want for things like ``Instruction``\ s and 166906f32e7eSjoergbasic blocks, which is why these are implemented with ``ilist``\ s. 167006f32e7eSjoerg 167106f32e7eSjoergRelated classes of interest are explained in the following subsections: 167206f32e7eSjoerg 167306f32e7eSjoerg* :ref:`ilist_traits <dss_ilist_traits>` 167406f32e7eSjoerg 167506f32e7eSjoerg* :ref:`iplist <dss_iplist>` 167606f32e7eSjoerg 167706f32e7eSjoerg* :ref:`llvm/ADT/ilist_node.h <dss_ilist_node>` 167806f32e7eSjoerg 167906f32e7eSjoerg* :ref:`Sentinels <dss_ilist_sentinel>` 168006f32e7eSjoerg 168106f32e7eSjoerg.. _dss_packedvector: 168206f32e7eSjoerg 168306f32e7eSjoergllvm/ADT/PackedVector.h 168406f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^ 168506f32e7eSjoerg 168606f32e7eSjoergUseful for storing a vector of values using only a few number of bits for each 168706f32e7eSjoergvalue. Apart from the standard operations of a vector-like container, it can 168806f32e7eSjoergalso perform an 'or' set operation. 168906f32e7eSjoerg 169006f32e7eSjoergFor example: 169106f32e7eSjoerg 169206f32e7eSjoerg.. code-block:: c++ 169306f32e7eSjoerg 169406f32e7eSjoerg enum State { 169506f32e7eSjoerg None = 0x0, 169606f32e7eSjoerg FirstCondition = 0x1, 169706f32e7eSjoerg SecondCondition = 0x2, 169806f32e7eSjoerg Both = 0x3 169906f32e7eSjoerg }; 170006f32e7eSjoerg 170106f32e7eSjoerg State get() { 170206f32e7eSjoerg PackedVector<State, 2> Vec1; 170306f32e7eSjoerg Vec1.push_back(FirstCondition); 170406f32e7eSjoerg 170506f32e7eSjoerg PackedVector<State, 2> Vec2; 170606f32e7eSjoerg Vec2.push_back(SecondCondition); 170706f32e7eSjoerg 170806f32e7eSjoerg Vec1 |= Vec2; 170906f32e7eSjoerg return Vec1[0]; // returns 'Both'. 171006f32e7eSjoerg } 171106f32e7eSjoerg 171206f32e7eSjoerg.. _dss_ilist_traits: 171306f32e7eSjoerg 171406f32e7eSjoergilist_traits 171506f32e7eSjoerg^^^^^^^^^^^^ 171606f32e7eSjoerg 171706f32e7eSjoerg``ilist_traits<T>`` is ``ilist<T>``'s customization mechanism. ``iplist<T>`` 171806f32e7eSjoerg(and consequently ``ilist<T>``) publicly derive from this traits class. 171906f32e7eSjoerg 172006f32e7eSjoerg.. _dss_iplist: 172106f32e7eSjoerg 172206f32e7eSjoergiplist 172306f32e7eSjoerg^^^^^^ 172406f32e7eSjoerg 172506f32e7eSjoerg``iplist<T>`` is ``ilist<T>``'s base and as such supports a slightly narrower 172606f32e7eSjoerginterface. Notably, inserters from ``T&`` are absent. 172706f32e7eSjoerg 172806f32e7eSjoerg``ilist_traits<T>`` is a public base of this class and can be used for a wide 172906f32e7eSjoergvariety of customizations. 173006f32e7eSjoerg 173106f32e7eSjoerg.. _dss_ilist_node: 173206f32e7eSjoerg 173306f32e7eSjoergllvm/ADT/ilist_node.h 173406f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^ 173506f32e7eSjoerg 173606f32e7eSjoerg``ilist_node<T>`` implements the forward and backward links that are expected 173706f32e7eSjoergby the ``ilist<T>`` (and analogous containers) in the default manner. 173806f32e7eSjoerg 173906f32e7eSjoerg``ilist_node<T>``\ s are meant to be embedded in the node type ``T``, usually 174006f32e7eSjoerg``T`` publicly derives from ``ilist_node<T>``. 174106f32e7eSjoerg 174206f32e7eSjoerg.. _dss_ilist_sentinel: 174306f32e7eSjoerg 174406f32e7eSjoergSentinels 174506f32e7eSjoerg^^^^^^^^^ 174606f32e7eSjoerg 174706f32e7eSjoerg``ilist``\ s have another specialty that must be considered. To be a good 174806f32e7eSjoergcitizen in the C++ ecosystem, it needs to support the standard container 174906f32e7eSjoergoperations, such as ``begin`` and ``end`` iterators, etc. Also, the 175006f32e7eSjoerg``operator--`` must work correctly on the ``end`` iterator in the case of 175106f32e7eSjoergnon-empty ``ilist``\ s. 175206f32e7eSjoerg 175306f32e7eSjoergThe only sensible solution to this problem is to allocate a so-called *sentinel* 175406f32e7eSjoergalong with the intrusive list, which serves as the ``end`` iterator, providing 175506f32e7eSjoergthe back-link to the last element. However conforming to the C++ convention it 175606f32e7eSjoergis illegal to ``operator++`` beyond the sentinel and it also must not be 175706f32e7eSjoergdereferenced. 175806f32e7eSjoerg 175906f32e7eSjoergThese constraints allow for some implementation freedom to the ``ilist`` how to 176006f32e7eSjoergallocate and store the sentinel. The corresponding policy is dictated by 176106f32e7eSjoerg``ilist_traits<T>``. By default a ``T`` gets heap-allocated whenever the need 176206f32e7eSjoergfor a sentinel arises. 176306f32e7eSjoerg 176406f32e7eSjoergWhile the default policy is sufficient in most cases, it may break down when 176506f32e7eSjoerg``T`` does not provide a default constructor. Also, in the case of many 176606f32e7eSjoerginstances of ``ilist``\ s, the memory overhead of the associated sentinels is 176706f32e7eSjoergwasted. To alleviate the situation with numerous and voluminous 176806f32e7eSjoerg``T``-sentinels, sometimes a trick is employed, leading to *ghostly sentinels*. 176906f32e7eSjoerg 177006f32e7eSjoergGhostly sentinels are obtained by specially-crafted ``ilist_traits<T>`` which 177106f32e7eSjoergsuperpose the sentinel with the ``ilist`` instance in memory. Pointer 177206f32e7eSjoergarithmetic is used to obtain the sentinel, which is relative to the ``ilist``'s 177306f32e7eSjoerg``this`` pointer. The ``ilist`` is augmented by an extra pointer, which serves 177406f32e7eSjoergas the back-link of the sentinel. This is the only field in the ghostly 177506f32e7eSjoergsentinel which can be legally accessed. 177606f32e7eSjoerg 177706f32e7eSjoerg.. _dss_other: 177806f32e7eSjoerg 177906f32e7eSjoergOther Sequential Container options 178006f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 178106f32e7eSjoerg 178206f32e7eSjoergOther STL containers are available, such as ``std::string``. 178306f32e7eSjoerg 178406f32e7eSjoergThere are also various STL adapter classes such as ``std::queue``, 178506f32e7eSjoerg``std::priority_queue``, ``std::stack``, etc. These provide simplified access 178606f32e7eSjoergto an underlying container but don't affect the cost of the container itself. 178706f32e7eSjoerg 178806f32e7eSjoerg.. _ds_string: 178906f32e7eSjoerg 179006f32e7eSjoergString-like containers 179106f32e7eSjoerg---------------------- 179206f32e7eSjoerg 179306f32e7eSjoergThere are a variety of ways to pass around and use strings in C and C++, and 179406f32e7eSjoergLLVM adds a few new options to choose from. Pick the first option on this list 179506f32e7eSjoergthat will do what you need, they are ordered according to their relative cost. 179606f32e7eSjoerg 179706f32e7eSjoergNote that it is generally preferred to *not* pass strings around as ``const 179806f32e7eSjoergchar*``'s. These have a number of problems, including the fact that they 179906f32e7eSjoergcannot represent embedded nul ("\0") characters, and do not have a length 180006f32e7eSjoergavailable efficiently. The general replacement for '``const char*``' is 180106f32e7eSjoergStringRef. 180206f32e7eSjoerg 180306f32e7eSjoergFor more information on choosing string containers for APIs, please see 180406f32e7eSjoerg:ref:`Passing Strings <string_apis>`. 180506f32e7eSjoerg 180606f32e7eSjoerg.. _dss_stringref: 180706f32e7eSjoerg 180806f32e7eSjoergllvm/ADT/StringRef.h 180906f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^ 181006f32e7eSjoerg 181106f32e7eSjoergThe StringRef class is a simple value class that contains a pointer to a 181206f32e7eSjoergcharacter and a length, and is quite related to the :ref:`ArrayRef 181306f32e7eSjoerg<dss_arrayref>` class (but specialized for arrays of characters). Because 181406f32e7eSjoergStringRef carries a length with it, it safely handles strings with embedded nul 181506f32e7eSjoergcharacters in it, getting the length does not require a strlen call, and it even 181606f32e7eSjoerghas very convenient APIs for slicing and dicing the character range that it 181706f32e7eSjoergrepresents. 181806f32e7eSjoerg 181906f32e7eSjoergStringRef is ideal for passing simple strings around that are known to be live, 182006f32e7eSjoergeither because they are C string literals, std::string, a C array, or a 182106f32e7eSjoergSmallVector. Each of these cases has an efficient implicit conversion to 182206f32e7eSjoergStringRef, which doesn't result in a dynamic strlen being executed. 182306f32e7eSjoerg 182406f32e7eSjoergStringRef has a few major limitations which make more powerful string containers 182506f32e7eSjoerguseful: 182606f32e7eSjoerg 182706f32e7eSjoerg#. You cannot directly convert a StringRef to a 'const char*' because there is 182806f32e7eSjoerg no way to add a trailing nul (unlike the .c_str() method on various stronger 182906f32e7eSjoerg classes). 183006f32e7eSjoerg 183106f32e7eSjoerg#. StringRef doesn't own or keep alive the underlying string bytes. 183206f32e7eSjoerg As such it can easily lead to dangling pointers, and is not suitable for 183306f32e7eSjoerg embedding in datastructures in most cases (instead, use an std::string or 183406f32e7eSjoerg something like that). 183506f32e7eSjoerg 183606f32e7eSjoerg#. For the same reason, StringRef cannot be used as the return value of a 183706f32e7eSjoerg method if the method "computes" the result string. Instead, use std::string. 183806f32e7eSjoerg 183906f32e7eSjoerg#. StringRef's do not allow you to mutate the pointed-to string bytes and it 184006f32e7eSjoerg doesn't allow you to insert or remove bytes from the range. For editing 184106f32e7eSjoerg operations like this, it interoperates with the :ref:`Twine <dss_twine>` 184206f32e7eSjoerg class. 184306f32e7eSjoerg 184406f32e7eSjoergBecause of its strengths and limitations, it is very common for a function to 184506f32e7eSjoergtake a StringRef and for a method on an object to return a StringRef that points 184606f32e7eSjoerginto some string that it owns. 184706f32e7eSjoerg 184806f32e7eSjoerg.. _dss_twine: 184906f32e7eSjoerg 185006f32e7eSjoergllvm/ADT/Twine.h 185106f32e7eSjoerg^^^^^^^^^^^^^^^^ 185206f32e7eSjoerg 185306f32e7eSjoergThe Twine class is used as an intermediary datatype for APIs that want to take a 185406f32e7eSjoergstring that can be constructed inline with a series of concatenations. Twine 185506f32e7eSjoergworks by forming recursive instances of the Twine datatype (a simple value 185606f32e7eSjoergobject) on the stack as temporary objects, linking them together into a tree 185706f32e7eSjoergwhich is then linearized when the Twine is consumed. Twine is only safe to use 185806f32e7eSjoergas the argument to a function, and should always be a const reference, e.g.: 185906f32e7eSjoerg 186006f32e7eSjoerg.. code-block:: c++ 186106f32e7eSjoerg 186206f32e7eSjoerg void foo(const Twine &T); 186306f32e7eSjoerg ... 186406f32e7eSjoerg StringRef X = ... 186506f32e7eSjoerg unsigned i = ... 186606f32e7eSjoerg foo(X + "." + Twine(i)); 186706f32e7eSjoerg 186806f32e7eSjoergThis example forms a string like "blarg.42" by concatenating the values 186906f32e7eSjoergtogether, and does not form intermediate strings containing "blarg" or "blarg.". 187006f32e7eSjoerg 187106f32e7eSjoergBecause Twine is constructed with temporary objects on the stack, and because 187206f32e7eSjoergthese instances are destroyed at the end of the current statement, it is an 187306f32e7eSjoerginherently dangerous API. For example, this simple variant contains undefined 187406f32e7eSjoergbehavior and will probably crash: 187506f32e7eSjoerg 187606f32e7eSjoerg.. code-block:: c++ 187706f32e7eSjoerg 187806f32e7eSjoerg void foo(const Twine &T); 187906f32e7eSjoerg ... 188006f32e7eSjoerg StringRef X = ... 188106f32e7eSjoerg unsigned i = ... 188206f32e7eSjoerg const Twine &Tmp = X + "." + Twine(i); 188306f32e7eSjoerg foo(Tmp); 188406f32e7eSjoerg 188506f32e7eSjoerg... because the temporaries are destroyed before the call. That said, Twine's 188606f32e7eSjoergare much more efficient than intermediate std::string temporaries, and they work 188706f32e7eSjoergreally well with StringRef. Just be aware of their limitations. 188806f32e7eSjoerg 188906f32e7eSjoerg.. _dss_smallstring: 189006f32e7eSjoerg 189106f32e7eSjoergllvm/ADT/SmallString.h 189206f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^ 189306f32e7eSjoerg 189406f32e7eSjoergSmallString is a subclass of :ref:`SmallVector <dss_smallvector>` that adds some 189506f32e7eSjoergconvenience APIs like += that takes StringRef's. SmallString avoids allocating 189606f32e7eSjoergmemory in the case when the preallocated space is enough to hold its data, and 189706f32e7eSjoergit calls back to general heap allocation when required. Since it owns its data, 189806f32e7eSjoergit is very safe to use and supports full mutation of the string. 189906f32e7eSjoerg 190006f32e7eSjoergLike SmallVector's, the big downside to SmallString is their sizeof. While they 190106f32e7eSjoergare optimized for small strings, they themselves are not particularly small. 190206f32e7eSjoergThis means that they work great for temporary scratch buffers on the stack, but 190306f32e7eSjoergshould not generally be put into the heap: it is very rare to see a SmallString 190406f32e7eSjoergas the member of a frequently-allocated heap data structure or returned 190506f32e7eSjoergby-value. 190606f32e7eSjoerg 190706f32e7eSjoerg.. _dss_stdstring: 190806f32e7eSjoerg 190906f32e7eSjoergstd::string 191006f32e7eSjoerg^^^^^^^^^^^ 191106f32e7eSjoerg 191206f32e7eSjoergThe standard C++ std::string class is a very general class that (like 191306f32e7eSjoergSmallString) owns its underlying data. sizeof(std::string) is very reasonable 191406f32e7eSjoergso it can be embedded into heap data structures and returned by-value. On the 191506f32e7eSjoergother hand, std::string is highly inefficient for inline editing (e.g. 191606f32e7eSjoergconcatenating a bunch of stuff together) and because it is provided by the 191706f32e7eSjoergstandard library, its performance characteristics depend a lot of the host 191806f32e7eSjoergstandard library (e.g. libc++ and MSVC provide a highly optimized string class, 191906f32e7eSjoergGCC contains a really slow implementation). 192006f32e7eSjoerg 192106f32e7eSjoergThe major disadvantage of std::string is that almost every operation that makes 192206f32e7eSjoergthem larger can allocate memory, which is slow. As such, it is better to use 192306f32e7eSjoergSmallVector or Twine as a scratch buffer, but then use std::string to persist 192406f32e7eSjoergthe result. 192506f32e7eSjoerg 192606f32e7eSjoerg.. _ds_set: 192706f32e7eSjoerg 192806f32e7eSjoergSet-Like Containers (std::set, SmallSet, SetVector, etc) 192906f32e7eSjoerg-------------------------------------------------------- 193006f32e7eSjoerg 193106f32e7eSjoergSet-like containers are useful when you need to canonicalize multiple values 193206f32e7eSjoerginto a single representation. There are several different choices for how to do 193306f32e7eSjoergthis, providing various trade-offs. 193406f32e7eSjoerg 193506f32e7eSjoerg.. _dss_sortedvectorset: 193606f32e7eSjoerg 193706f32e7eSjoergA sorted 'vector' 193806f32e7eSjoerg^^^^^^^^^^^^^^^^^ 193906f32e7eSjoerg 194006f32e7eSjoergIf you intend to insert a lot of elements, then do a lot of queries, a great 194106f32e7eSjoergapproach is to use an std::vector (or other sequential container) with 194206f32e7eSjoergstd::sort+std::unique to remove duplicates. This approach works really well if 194306f32e7eSjoergyour usage pattern has these two distinct phases (insert then query), and can be 194406f32e7eSjoergcoupled with a good choice of :ref:`sequential container <ds_sequential>`. 194506f32e7eSjoerg 194606f32e7eSjoergThis combination provides the several nice properties: the result data is 194706f32e7eSjoergcontiguous in memory (good for cache locality), has few allocations, is easy to 194806f32e7eSjoergaddress (iterators in the final vector are just indices or pointers), and can be 194906f32e7eSjoergefficiently queried with a standard binary search (e.g. 195006f32e7eSjoerg``std::lower_bound``; if you want the whole range of elements comparing 195106f32e7eSjoergequal, use ``std::equal_range``). 195206f32e7eSjoerg 195306f32e7eSjoerg.. _dss_smallset: 195406f32e7eSjoerg 195506f32e7eSjoergllvm/ADT/SmallSet.h 195606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^ 195706f32e7eSjoerg 195806f32e7eSjoergIf you have a set-like data structure that is usually small and whose elements 195906f32e7eSjoergare reasonably small, a ``SmallSet<Type, N>`` is a good choice. This set has 196006f32e7eSjoergspace for N elements in place (thus, if the set is dynamically smaller than N, 196106f32e7eSjoergno malloc traffic is required) and accesses them with a simple linear search. 196206f32e7eSjoergWhen the set grows beyond N elements, it allocates a more expensive 196306f32e7eSjoergrepresentation that guarantees efficient access (for most types, it falls back 196406f32e7eSjoergto :ref:`std::set <dss_set>`, but for pointers it uses something far better, 196506f32e7eSjoerg:ref:`SmallPtrSet <dss_smallptrset>`. 196606f32e7eSjoerg 196706f32e7eSjoergThe magic of this class is that it handles small sets extremely efficiently, but 196806f32e7eSjoerggracefully handles extremely large sets without loss of efficiency. 196906f32e7eSjoerg 197006f32e7eSjoerg.. _dss_smallptrset: 197106f32e7eSjoerg 197206f32e7eSjoergllvm/ADT/SmallPtrSet.h 197306f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^ 197406f32e7eSjoerg 197506f32e7eSjoerg``SmallPtrSet`` has all the advantages of ``SmallSet`` (and a ``SmallSet`` of 197606f32e7eSjoergpointers is transparently implemented with a ``SmallPtrSet``). If more than N 197706f32e7eSjoerginsertions are performed, a single quadratically probed hash table is allocated 197806f32e7eSjoergand grows as needed, providing extremely efficient access (constant time 197906f32e7eSjoerginsertion/deleting/queries with low constant factors) and is very stingy with 198006f32e7eSjoergmalloc traffic. 198106f32e7eSjoerg 198206f32e7eSjoergNote that, unlike :ref:`std::set <dss_set>`, the iterators of ``SmallPtrSet`` 198306f32e7eSjoergare invalidated whenever an insertion occurs. Also, the values visited by the 198406f32e7eSjoergiterators are not visited in sorted order. 198506f32e7eSjoerg 198606f32e7eSjoerg.. _dss_stringset: 198706f32e7eSjoerg 198806f32e7eSjoergllvm/ADT/StringSet.h 198906f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^ 199006f32e7eSjoerg 199106f32e7eSjoerg``StringSet`` is a thin wrapper around :ref:`StringMap\<char\> <dss_stringmap>`, 199206f32e7eSjoergand it allows efficient storage and retrieval of unique strings. 199306f32e7eSjoerg 199406f32e7eSjoergFunctionally analogous to ``SmallSet<StringRef>``, ``StringSet`` also supports 199506f32e7eSjoergiteration. (The iterator dereferences to a ``StringMapEntry<char>``, so you 199606f32e7eSjoergneed to call ``i->getKey()`` to access the item of the StringSet.) On the 199706f32e7eSjoergother hand, ``StringSet`` doesn't support range-insertion and 199806f32e7eSjoergcopy-construction, which :ref:`SmallSet <dss_smallset>` and :ref:`SmallPtrSet 199906f32e7eSjoerg<dss_smallptrset>` do support. 200006f32e7eSjoerg 200106f32e7eSjoerg.. _dss_denseset: 200206f32e7eSjoerg 200306f32e7eSjoergllvm/ADT/DenseSet.h 200406f32e7eSjoerg^^^^^^^^^^^^^^^^^^^ 200506f32e7eSjoerg 200606f32e7eSjoergDenseSet is a simple quadratically probed hash table. It excels at supporting 200706f32e7eSjoergsmall values: it uses a single allocation to hold all of the pairs that are 200806f32e7eSjoergcurrently inserted in the set. DenseSet is a great way to unique small values 200906f32e7eSjoergthat are not simple pointers (use :ref:`SmallPtrSet <dss_smallptrset>` for 201006f32e7eSjoergpointers). Note that DenseSet has the same requirements for the value type that 201106f32e7eSjoerg:ref:`DenseMap <dss_densemap>` has. 201206f32e7eSjoerg 201306f32e7eSjoerg.. _dss_sparseset: 201406f32e7eSjoerg 201506f32e7eSjoergllvm/ADT/SparseSet.h 201606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^ 201706f32e7eSjoerg 201806f32e7eSjoergSparseSet holds a small number of objects identified by unsigned keys of 201906f32e7eSjoergmoderate size. It uses a lot of memory, but provides operations that are almost 202006f32e7eSjoergas fast as a vector. Typical keys are physical registers, virtual registers, or 202106f32e7eSjoergnumbered basic blocks. 202206f32e7eSjoerg 202306f32e7eSjoergSparseSet is useful for algorithms that need very fast clear/find/insert/erase 202406f32e7eSjoergand fast iteration over small sets. It is not intended for building composite 202506f32e7eSjoergdata structures. 202606f32e7eSjoerg 202706f32e7eSjoerg.. _dss_sparsemultiset: 202806f32e7eSjoerg 202906f32e7eSjoergllvm/ADT/SparseMultiSet.h 203006f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 203106f32e7eSjoerg 203206f32e7eSjoergSparseMultiSet adds multiset behavior to SparseSet, while retaining SparseSet's 203306f32e7eSjoergdesirable attributes. Like SparseSet, it typically uses a lot of memory, but 203406f32e7eSjoergprovides operations that are almost as fast as a vector. Typical keys are 203506f32e7eSjoergphysical registers, virtual registers, or numbered basic blocks. 203606f32e7eSjoerg 203706f32e7eSjoergSparseMultiSet is useful for algorithms that need very fast 203806f32e7eSjoergclear/find/insert/erase of the entire collection, and iteration over sets of 203906f32e7eSjoergelements sharing a key. It is often a more efficient choice than using composite 204006f32e7eSjoergdata structures (e.g. vector-of-vectors, map-of-vectors). It is not intended for 204106f32e7eSjoergbuilding composite data structures. 204206f32e7eSjoerg 204306f32e7eSjoerg.. _dss_FoldingSet: 204406f32e7eSjoerg 204506f32e7eSjoergllvm/ADT/FoldingSet.h 204606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^ 204706f32e7eSjoerg 204806f32e7eSjoergFoldingSet is an aggregate class that is really good at uniquing 204906f32e7eSjoergexpensive-to-create or polymorphic objects. It is a combination of a chained 205006f32e7eSjoerghash table with intrusive links (uniqued objects are required to inherit from 205106f32e7eSjoergFoldingSetNode) that uses :ref:`SmallVector <dss_smallvector>` as part of its ID 205206f32e7eSjoergprocess. 205306f32e7eSjoerg 205406f32e7eSjoergConsider a case where you want to implement a "getOrCreateFoo" method for a 205506f32e7eSjoergcomplex object (for example, a node in the code generator). The client has a 205606f32e7eSjoergdescription of **what** it wants to generate (it knows the opcode and all the 205706f32e7eSjoergoperands), but we don't want to 'new' a node, then try inserting it into a set 205806f32e7eSjoergonly to find out it already exists, at which point we would have to delete it 205906f32e7eSjoergand return the node that already exists. 206006f32e7eSjoerg 206106f32e7eSjoergTo support this style of client, FoldingSet perform a query with a 206206f32e7eSjoergFoldingSetNodeID (which wraps SmallVector) that can be used to describe the 206306f32e7eSjoergelement that we want to query for. The query either returns the element 206406f32e7eSjoergmatching the ID or it returns an opaque ID that indicates where insertion should 206506f32e7eSjoergtake place. Construction of the ID usually does not require heap traffic. 206606f32e7eSjoerg 206706f32e7eSjoergBecause FoldingSet uses intrusive links, it can support polymorphic objects in 206806f32e7eSjoergthe set (for example, you can have SDNode instances mixed with LoadSDNodes). 206906f32e7eSjoergBecause the elements are individually allocated, pointers to the elements are 207006f32e7eSjoergstable: inserting or removing elements does not invalidate any pointers to other 207106f32e7eSjoergelements. 207206f32e7eSjoerg 207306f32e7eSjoerg.. _dss_set: 207406f32e7eSjoerg 207506f32e7eSjoerg<set> 207606f32e7eSjoerg^^^^^ 207706f32e7eSjoerg 207806f32e7eSjoerg``std::set`` is a reasonable all-around set class, which is decent at many 207906f32e7eSjoergthings but great at nothing. std::set allocates memory for each element 208006f32e7eSjoerginserted (thus it is very malloc intensive) and typically stores three pointers 208106f32e7eSjoergper element in the set (thus adding a large amount of per-element space 208206f32e7eSjoergoverhead). It offers guaranteed log(n) performance, which is not particularly 208306f32e7eSjoergfast from a complexity standpoint (particularly if the elements of the set are 208406f32e7eSjoergexpensive to compare, like strings), and has extremely high constant factors for 208506f32e7eSjoerglookup, insertion and removal. 208606f32e7eSjoerg 208706f32e7eSjoergThe advantages of std::set are that its iterators are stable (deleting or 208806f32e7eSjoerginserting an element from the set does not affect iterators or pointers to other 208906f32e7eSjoergelements) and that iteration over the set is guaranteed to be in sorted order. 209006f32e7eSjoergIf the elements in the set are large, then the relative overhead of the pointers 209106f32e7eSjoergand malloc traffic is not a big deal, but if the elements of the set are small, 209206f32e7eSjoergstd::set is almost never a good choice. 209306f32e7eSjoerg 209406f32e7eSjoerg.. _dss_setvector: 209506f32e7eSjoerg 209606f32e7eSjoergllvm/ADT/SetVector.h 209706f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^ 209806f32e7eSjoerg 209906f32e7eSjoergLLVM's ``SetVector<Type>`` is an adapter class that combines your choice of a 210006f32e7eSjoergset-like container along with a :ref:`Sequential Container <ds_sequential>` The 210106f32e7eSjoergimportant property that this provides is efficient insertion with uniquing 210206f32e7eSjoerg(duplicate elements are ignored) with iteration support. It implements this by 210306f32e7eSjoerginserting elements into both a set-like container and the sequential container, 210406f32e7eSjoergusing the set-like container for uniquing and the sequential container for 210506f32e7eSjoergiteration. 210606f32e7eSjoerg 210706f32e7eSjoergThe difference between SetVector and other sets is that the order of iteration 210806f32e7eSjoergis guaranteed to match the order of insertion into the SetVector. This property 210906f32e7eSjoergis really important for things like sets of pointers. Because pointer values 211006f32e7eSjoergare non-deterministic (e.g. vary across runs of the program on different 211106f32e7eSjoergmachines), iterating over the pointers in the set will not be in a well-defined 211206f32e7eSjoergorder. 211306f32e7eSjoerg 211406f32e7eSjoergThe drawback of SetVector is that it requires twice as much space as a normal 211506f32e7eSjoergset and has the sum of constant factors from the set-like container and the 211606f32e7eSjoergsequential container that it uses. Use it **only** if you need to iterate over 211706f32e7eSjoergthe elements in a deterministic order. SetVector is also expensive to delete 211806f32e7eSjoergelements out of (linear time), unless you use its "pop_back" method, which is 211906f32e7eSjoergfaster. 212006f32e7eSjoerg 212106f32e7eSjoerg``SetVector`` is an adapter class that defaults to using ``std::vector`` and a 212206f32e7eSjoergsize 16 ``SmallSet`` for the underlying containers, so it is quite expensive. 212306f32e7eSjoergHowever, ``"llvm/ADT/SetVector.h"`` also provides a ``SmallSetVector`` class, 212406f32e7eSjoergwhich defaults to using a ``SmallVector`` and ``SmallSet`` of a specified size. 212506f32e7eSjoergIf you use this, and if your sets are dynamically smaller than ``N``, you will 212606f32e7eSjoergsave a lot of heap traffic. 212706f32e7eSjoerg 212806f32e7eSjoerg.. _dss_uniquevector: 212906f32e7eSjoerg 213006f32e7eSjoergllvm/ADT/UniqueVector.h 213106f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^ 213206f32e7eSjoerg 213306f32e7eSjoergUniqueVector is similar to :ref:`SetVector <dss_setvector>` but it retains a 213406f32e7eSjoergunique ID for each element inserted into the set. It internally contains a map 213506f32e7eSjoergand a vector, and it assigns a unique ID for each value inserted into the set. 213606f32e7eSjoerg 213706f32e7eSjoergUniqueVector is very expensive: its cost is the sum of the cost of maintaining 213806f32e7eSjoergboth the map and vector, it has high complexity, high constant factors, and 213906f32e7eSjoergproduces a lot of malloc traffic. It should be avoided. 214006f32e7eSjoerg 214106f32e7eSjoerg.. _dss_immutableset: 214206f32e7eSjoerg 214306f32e7eSjoergllvm/ADT/ImmutableSet.h 214406f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^ 214506f32e7eSjoerg 214606f32e7eSjoergImmutableSet is an immutable (functional) set implementation based on an AVL 214706f32e7eSjoergtree. Adding or removing elements is done through a Factory object and results 214806f32e7eSjoergin the creation of a new ImmutableSet object. If an ImmutableSet already exists 214906f32e7eSjoergwith the given contents, then the existing one is returned; equality is compared 215006f32e7eSjoergwith a FoldingSetNodeID. The time and space complexity of add or remove 215106f32e7eSjoergoperations is logarithmic in the size of the original set. 215206f32e7eSjoerg 215306f32e7eSjoergThere is no method for returning an element of the set, you can only check for 215406f32e7eSjoergmembership. 215506f32e7eSjoerg 215606f32e7eSjoerg.. _dss_otherset: 215706f32e7eSjoerg 215806f32e7eSjoergOther Set-Like Container Options 215906f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 216006f32e7eSjoerg 216106f32e7eSjoergThe STL provides several other options, such as std::multiset and the various 216206f32e7eSjoerg"hash_set" like containers (whether from C++ TR1 or from the SGI library). We 216306f32e7eSjoergnever use hash_set and unordered_set because they are generally very expensive 216406f32e7eSjoerg(each insertion requires a malloc) and very non-portable. 216506f32e7eSjoerg 216606f32e7eSjoergstd::multiset is useful if you're not interested in elimination of duplicates, 216706f32e7eSjoergbut has all the drawbacks of :ref:`std::set <dss_set>`. A sorted vector 216806f32e7eSjoerg(where you don't delete duplicate entries) or some other approach is almost 216906f32e7eSjoergalways better. 217006f32e7eSjoerg 217106f32e7eSjoerg.. _ds_map: 217206f32e7eSjoerg 217306f32e7eSjoergMap-Like Containers (std::map, DenseMap, etc) 217406f32e7eSjoerg--------------------------------------------- 217506f32e7eSjoerg 217606f32e7eSjoergMap-like containers are useful when you want to associate data to a key. As 217706f32e7eSjoergusual, there are a lot of different ways to do this. :) 217806f32e7eSjoerg 217906f32e7eSjoerg.. _dss_sortedvectormap: 218006f32e7eSjoerg 218106f32e7eSjoergA sorted 'vector' 218206f32e7eSjoerg^^^^^^^^^^^^^^^^^ 218306f32e7eSjoerg 218406f32e7eSjoergIf your usage pattern follows a strict insert-then-query approach, you can 218506f32e7eSjoergtrivially use the same approach as :ref:`sorted vectors for set-like containers 218606f32e7eSjoerg<dss_sortedvectorset>`. The only difference is that your query function (which 218706f32e7eSjoerguses std::lower_bound to get efficient log(n) lookup) should only compare the 218806f32e7eSjoergkey, not both the key and value. This yields the same advantages as sorted 218906f32e7eSjoergvectors for sets. 219006f32e7eSjoerg 219106f32e7eSjoerg.. _dss_stringmap: 219206f32e7eSjoerg 219306f32e7eSjoergllvm/ADT/StringMap.h 219406f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^ 219506f32e7eSjoerg 219606f32e7eSjoergStrings are commonly used as keys in maps, and they are difficult to support 219706f32e7eSjoergefficiently: they are variable length, inefficient to hash and compare when 219806f32e7eSjoerglong, expensive to copy, etc. StringMap is a specialized container designed to 219906f32e7eSjoergcope with these issues. It supports mapping an arbitrary range of bytes to an 220006f32e7eSjoergarbitrary other object. 220106f32e7eSjoerg 220206f32e7eSjoergThe StringMap implementation uses a quadratically-probed hash table, where the 220306f32e7eSjoergbuckets store a pointer to the heap allocated entries (and some other stuff). 220406f32e7eSjoergThe entries in the map must be heap allocated because the strings are variable 220506f32e7eSjoerglength. The string data (key) and the element object (value) are stored in the 220606f32e7eSjoergsame allocation with the string data immediately after the element object. 220706f32e7eSjoergThis container guarantees the "``(char*)(&Value+1)``" points to the key string 220806f32e7eSjoergfor a value. 220906f32e7eSjoerg 221006f32e7eSjoergThe StringMap is very fast for several reasons: quadratic probing is very cache 221106f32e7eSjoergefficient for lookups, the hash value of strings in buckets is not recomputed 221206f32e7eSjoergwhen looking up an element, StringMap rarely has to touch the memory for 221306f32e7eSjoergunrelated objects when looking up a value (even when hash collisions happen), 221406f32e7eSjoerghash table growth does not recompute the hash values for strings already in the 221506f32e7eSjoergtable, and each pair in the map is store in a single allocation (the string data 221606f32e7eSjoergis stored in the same allocation as the Value of a pair). 221706f32e7eSjoerg 221806f32e7eSjoergStringMap also provides query methods that take byte ranges, so it only ever 221906f32e7eSjoergcopies a string if a value is inserted into the table. 222006f32e7eSjoerg 222106f32e7eSjoergStringMap iteration order, however, is not guaranteed to be deterministic, so 222206f32e7eSjoergany uses which require that should instead use a std::map. 222306f32e7eSjoerg 222406f32e7eSjoerg.. _dss_indexmap: 222506f32e7eSjoerg 222606f32e7eSjoergllvm/ADT/IndexedMap.h 222706f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^ 222806f32e7eSjoerg 222906f32e7eSjoergIndexedMap is a specialized container for mapping small dense integers (or 223006f32e7eSjoergvalues that can be mapped to small dense integers) to some other type. It is 223106f32e7eSjoerginternally implemented as a vector with a mapping function that maps the keys 223206f32e7eSjoergto the dense integer range. 223306f32e7eSjoerg 223406f32e7eSjoergThis is useful for cases like virtual registers in the LLVM code generator: they 223506f32e7eSjoerghave a dense mapping that is offset by a compile-time constant (the first 223606f32e7eSjoergvirtual register ID). 223706f32e7eSjoerg 223806f32e7eSjoerg.. _dss_densemap: 223906f32e7eSjoerg 224006f32e7eSjoergllvm/ADT/DenseMap.h 224106f32e7eSjoerg^^^^^^^^^^^^^^^^^^^ 224206f32e7eSjoerg 224306f32e7eSjoergDenseMap is a simple quadratically probed hash table. It excels at supporting 224406f32e7eSjoergsmall keys and values: it uses a single allocation to hold all of the pairs 224506f32e7eSjoergthat are currently inserted in the map. DenseMap is a great way to map 224606f32e7eSjoergpointers to pointers, or map other small types to each other. 224706f32e7eSjoerg 224806f32e7eSjoergThere are several aspects of DenseMap that you should be aware of, however. 224906f32e7eSjoergThe iterators in a DenseMap are invalidated whenever an insertion occurs, 225006f32e7eSjoergunlike map. Also, because DenseMap allocates space for a large number of 225106f32e7eSjoergkey/value pairs (it starts with 64 by default), it will waste a lot of space if 225206f32e7eSjoergyour keys or values are large. Finally, you must implement a partial 225306f32e7eSjoergspecialization of DenseMapInfo for the key that you want, if it isn't already 225406f32e7eSjoergsupported. This is required to tell DenseMap about two special marker values 225506f32e7eSjoerg(which can never be inserted into the map) that it needs internally. 225606f32e7eSjoerg 225706f32e7eSjoergDenseMap's find_as() method supports lookup operations using an alternate key 225806f32e7eSjoergtype. This is useful in cases where the normal key type is expensive to 225906f32e7eSjoergconstruct, but cheap to compare against. The DenseMapInfo is responsible for 226006f32e7eSjoergdefining the appropriate comparison and hashing methods for each alternate key 226106f32e7eSjoergtype used. 226206f32e7eSjoerg 226306f32e7eSjoerg.. _dss_valuemap: 226406f32e7eSjoerg 226506f32e7eSjoergllvm/IR/ValueMap.h 226606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^ 226706f32e7eSjoerg 226806f32e7eSjoergValueMap is a wrapper around a :ref:`DenseMap <dss_densemap>` mapping 226906f32e7eSjoerg``Value*``\ s (or subclasses) to another type. When a Value is deleted or 227006f32e7eSjoergRAUW'ed, ValueMap will update itself so the new version of the key is mapped to 227106f32e7eSjoergthe same value, just as if the key were a WeakVH. You can configure exactly how 227206f32e7eSjoergthis happens, and what else happens on these two events, by passing a ``Config`` 227306f32e7eSjoergparameter to the ValueMap template. 227406f32e7eSjoerg 227506f32e7eSjoerg.. _dss_intervalmap: 227606f32e7eSjoerg 227706f32e7eSjoergllvm/ADT/IntervalMap.h 227806f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^ 227906f32e7eSjoerg 228006f32e7eSjoergIntervalMap is a compact map for small keys and values. It maps key intervals 228106f32e7eSjoerginstead of single keys, and it will automatically coalesce adjacent intervals. 228206f32e7eSjoergWhen the map only contains a few intervals, they are stored in the map object 228306f32e7eSjoergitself to avoid allocations. 228406f32e7eSjoerg 228506f32e7eSjoergThe IntervalMap iterators are quite big, so they should not be passed around as 228606f32e7eSjoergSTL iterators. The heavyweight iterators allow a smaller data structure. 228706f32e7eSjoerg 228806f32e7eSjoerg.. _dss_map: 228906f32e7eSjoerg 229006f32e7eSjoerg<map> 229106f32e7eSjoerg^^^^^ 229206f32e7eSjoerg 229306f32e7eSjoergstd::map has similar characteristics to :ref:`std::set <dss_set>`: it uses a 229406f32e7eSjoergsingle allocation per pair inserted into the map, it offers log(n) lookup with 229506f32e7eSjoergan extremely large constant factor, imposes a space penalty of 3 pointers per 229606f32e7eSjoergpair in the map, etc. 229706f32e7eSjoerg 229806f32e7eSjoergstd::map is most useful when your keys or values are very large, if you need to 229906f32e7eSjoergiterate over the collection in sorted order, or if you need stable iterators 230006f32e7eSjoerginto the map (i.e. they don't get invalidated if an insertion or deletion of 230106f32e7eSjoerganother element takes place). 230206f32e7eSjoerg 230306f32e7eSjoerg.. _dss_mapvector: 230406f32e7eSjoerg 230506f32e7eSjoergllvm/ADT/MapVector.h 230606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^ 230706f32e7eSjoerg 230806f32e7eSjoerg``MapVector<KeyT,ValueT>`` provides a subset of the DenseMap interface. The 230906f32e7eSjoergmain difference is that the iteration order is guaranteed to be the insertion 231006f32e7eSjoergorder, making it an easy (but somewhat expensive) solution for non-deterministic 231106f32e7eSjoergiteration over maps of pointers. 231206f32e7eSjoerg 231306f32e7eSjoergIt is implemented by mapping from key to an index in a vector of key,value 231406f32e7eSjoergpairs. This provides fast lookup and iteration, but has two main drawbacks: 231506f32e7eSjoergthe key is stored twice and removing elements takes linear time. If it is 231606f32e7eSjoergnecessary to remove elements, it's best to remove them in bulk using 231706f32e7eSjoerg``remove_if()``. 231806f32e7eSjoerg 231906f32e7eSjoerg.. _dss_inteqclasses: 232006f32e7eSjoerg 232106f32e7eSjoergllvm/ADT/IntEqClasses.h 232206f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^ 232306f32e7eSjoerg 232406f32e7eSjoergIntEqClasses provides a compact representation of equivalence classes of small 232506f32e7eSjoergintegers. Initially, each integer in the range 0..n-1 has its own equivalence 232606f32e7eSjoergclass. Classes can be joined by passing two class representatives to the 232706f32e7eSjoergjoin(a, b) method. Two integers are in the same class when findLeader() returns 232806f32e7eSjoergthe same representative. 232906f32e7eSjoerg 233006f32e7eSjoergOnce all equivalence classes are formed, the map can be compressed so each 233106f32e7eSjoerginteger 0..n-1 maps to an equivalence class number in the range 0..m-1, where m 233206f32e7eSjoergis the total number of equivalence classes. The map must be uncompressed before 233306f32e7eSjoergit can be edited again. 233406f32e7eSjoerg 233506f32e7eSjoerg.. _dss_immutablemap: 233606f32e7eSjoerg 233706f32e7eSjoergllvm/ADT/ImmutableMap.h 233806f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^ 233906f32e7eSjoerg 234006f32e7eSjoergImmutableMap is an immutable (functional) map implementation based on an AVL 234106f32e7eSjoergtree. Adding or removing elements is done through a Factory object and results 234206f32e7eSjoergin the creation of a new ImmutableMap object. If an ImmutableMap already exists 234306f32e7eSjoergwith the given key set, then the existing one is returned; equality is compared 234406f32e7eSjoergwith a FoldingSetNodeID. The time and space complexity of add or remove 234506f32e7eSjoergoperations is logarithmic in the size of the original map. 234606f32e7eSjoerg 234706f32e7eSjoerg.. _dss_othermap: 234806f32e7eSjoerg 234906f32e7eSjoergOther Map-Like Container Options 235006f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 235106f32e7eSjoerg 235206f32e7eSjoergThe STL provides several other options, such as std::multimap and the various 235306f32e7eSjoerg"hash_map" like containers (whether from C++ TR1 or from the SGI library). We 235406f32e7eSjoergnever use hash_set and unordered_set because they are generally very expensive 235506f32e7eSjoerg(each insertion requires a malloc) and very non-portable. 235606f32e7eSjoerg 235706f32e7eSjoergstd::multimap is useful if you want to map a key to multiple values, but has all 235806f32e7eSjoergthe drawbacks of std::map. A sorted vector or some other approach is almost 235906f32e7eSjoergalways better. 236006f32e7eSjoerg 236106f32e7eSjoerg.. _ds_bit: 236206f32e7eSjoerg 2363*da58b97aSjoergBit storage containers (BitVector, SparseBitVector, CoalescingBitVector) 2364*da58b97aSjoerg------------------------------------------------------------------------ 236506f32e7eSjoerg 2366*da58b97aSjoergThere are three bit storage containers, and choosing when to use each is 2367*da58b97aSjoergrelatively straightforward. 236806f32e7eSjoerg 236906f32e7eSjoergOne additional option is ``std::vector<bool>``: we discourage its use for two 237006f32e7eSjoergreasons 1) the implementation in many common compilers (e.g. commonly 237106f32e7eSjoergavailable versions of GCC) is extremely inefficient and 2) the C++ standards 237206f32e7eSjoergcommittee is likely to deprecate this container and/or change it significantly 237306f32e7eSjoergsomehow. In any case, please don't use it. 237406f32e7eSjoerg 237506f32e7eSjoerg.. _dss_bitvector: 237606f32e7eSjoerg 237706f32e7eSjoergBitVector 237806f32e7eSjoerg^^^^^^^^^ 237906f32e7eSjoerg 238006f32e7eSjoergThe BitVector container provides a dynamic size set of bits for manipulation. 238106f32e7eSjoergIt supports individual bit setting/testing, as well as set operations. The set 238206f32e7eSjoergoperations take time O(size of bitvector), but operations are performed one word 238306f32e7eSjoergat a time, instead of one bit at a time. This makes the BitVector very fast for 238406f32e7eSjoergset operations compared to other containers. Use the BitVector when you expect 238506f32e7eSjoergthe number of set bits to be high (i.e. a dense set). 238606f32e7eSjoerg 238706f32e7eSjoerg.. _dss_smallbitvector: 238806f32e7eSjoerg 238906f32e7eSjoergSmallBitVector 239006f32e7eSjoerg^^^^^^^^^^^^^^ 239106f32e7eSjoerg 239206f32e7eSjoergThe SmallBitVector container provides the same interface as BitVector, but it is 239306f32e7eSjoergoptimized for the case where only a small number of bits, less than 25 or so, 239406f32e7eSjoergare needed. It also transparently supports larger bit counts, but slightly less 239506f32e7eSjoergefficiently than a plain BitVector, so SmallBitVector should only be used when 239606f32e7eSjoerglarger counts are rare. 239706f32e7eSjoerg 239806f32e7eSjoergAt this time, SmallBitVector does not support set operations (and, or, xor), and 239906f32e7eSjoergits operator[] does not provide an assignable lvalue. 240006f32e7eSjoerg 240106f32e7eSjoerg.. _dss_sparsebitvector: 240206f32e7eSjoerg 240306f32e7eSjoergSparseBitVector 240406f32e7eSjoerg^^^^^^^^^^^^^^^ 240506f32e7eSjoerg 240606f32e7eSjoergThe SparseBitVector container is much like BitVector, with one major difference: 240706f32e7eSjoergOnly the bits that are set, are stored. This makes the SparseBitVector much 240806f32e7eSjoergmore space efficient than BitVector when the set is sparse, as well as making 240906f32e7eSjoergset operations O(number of set bits) instead of O(size of universe). The 241006f32e7eSjoergdownside to the SparseBitVector is that setting and testing of random bits is 241106f32e7eSjoergO(N), and on large SparseBitVectors, this can be slower than BitVector. In our 241206f32e7eSjoergimplementation, setting or testing bits in sorted order (either forwards or 241306f32e7eSjoergreverse) is O(1) worst case. Testing and setting bits within 128 bits (depends 241406f32e7eSjoergon size) of the current bit is also O(1). As a general statement, 241506f32e7eSjoergtesting/setting bits in a SparseBitVector is O(distance away from last set bit). 241606f32e7eSjoerg 2417*da58b97aSjoerg.. _dss_coalescingbitvector: 2418*da58b97aSjoerg 2419*da58b97aSjoergCoalescingBitVector 2420*da58b97aSjoerg^^^^^^^^^^^^^^^^^^^ 2421*da58b97aSjoerg 2422*da58b97aSjoergThe CoalescingBitVector container is similar in principle to a SparseBitVector, 2423*da58b97aSjoergbut is optimized to represent large contiguous ranges of set bits compactly. It 2424*da58b97aSjoergdoes this by coalescing contiguous ranges of set bits into intervals. Searching 2425*da58b97aSjoergfor a bit in a CoalescingBitVector is O(log(gaps between contiguous ranges)). 2426*da58b97aSjoerg 2427*da58b97aSjoergCoalescingBitVector is a better choice than BitVector when gaps between ranges 2428*da58b97aSjoergof set bits are large. It's a better choice than SparseBitVector when find() 2429*da58b97aSjoergoperations must have fast, predictable performance. However, it's not a good 2430*da58b97aSjoergchoice for representing sets which have lots of very short ranges. E.g. the set 2431*da58b97aSjoerg`{2*x : x \in [0, n)}` would be a pathological input. 2432*da58b97aSjoerg 243306f32e7eSjoerg.. _debugging: 243406f32e7eSjoerg 243506f32e7eSjoergDebugging 243606f32e7eSjoerg========= 243706f32e7eSjoerg 243806f32e7eSjoergA handful of `GDB pretty printers 243906f32e7eSjoerg<https://sourceware.org/gdb/onlinedocs/gdb/Pretty-Printing.html>`__ are 244006f32e7eSjoergprovided for some of the core LLVM libraries. To use them, execute the 244106f32e7eSjoergfollowing (or add it to your ``~/.gdbinit``):: 244206f32e7eSjoerg 244306f32e7eSjoerg source /path/to/llvm/src/utils/gdb-scripts/prettyprinters.py 244406f32e7eSjoerg 244506f32e7eSjoergIt also might be handy to enable the `print pretty 244606f32e7eSjoerg<http://ftp.gnu.org/old-gnu/Manuals/gdb/html_node/gdb_57.html>`__ option to 244706f32e7eSjoergavoid data structures being printed as a big block of text. 244806f32e7eSjoerg 244906f32e7eSjoerg.. _common: 245006f32e7eSjoerg 245106f32e7eSjoergHelpful Hints for Common Operations 245206f32e7eSjoerg=================================== 245306f32e7eSjoerg 245406f32e7eSjoergThis section describes how to perform some very simple transformations of LLVM 245506f32e7eSjoergcode. This is meant to give examples of common idioms used, showing the 245606f32e7eSjoergpractical side of LLVM transformations. 245706f32e7eSjoerg 245806f32e7eSjoergBecause this is a "how-to" section, you should also read about the main classes 245906f32e7eSjoergthat you will be working with. The :ref:`Core LLVM Class Hierarchy Reference 246006f32e7eSjoerg<coreclasses>` contains details and descriptions of the main classes that you 246106f32e7eSjoergshould know about. 246206f32e7eSjoerg 246306f32e7eSjoerg.. _inspection: 246406f32e7eSjoerg 246506f32e7eSjoergBasic Inspection and Traversal Routines 246606f32e7eSjoerg--------------------------------------- 246706f32e7eSjoerg 246806f32e7eSjoergThe LLVM compiler infrastructure have many different data structures that may be 246906f32e7eSjoergtraversed. Following the example of the C++ standard template library, the 247006f32e7eSjoergtechniques used to traverse these various data structures are all basically the 2471*da58b97aSjoergsame. For an enumerable sequence of values, the ``XXXbegin()`` function (or 247206f32e7eSjoergmethod) returns an iterator to the start of the sequence, the ``XXXend()`` 247306f32e7eSjoergfunction returns an iterator pointing to one past the last valid element of the 247406f32e7eSjoergsequence, and there is some ``XXXiterator`` data type that is common between the 247506f32e7eSjoergtwo operations. 247606f32e7eSjoerg 247706f32e7eSjoergBecause the pattern for iteration is common across many different aspects of the 247806f32e7eSjoergprogram representation, the standard template library algorithms may be used on 247906f32e7eSjoergthem, and it is easier to remember how to iterate. First we show a few common 248006f32e7eSjoergexamples of the data structures that need to be traversed. Other data 248106f32e7eSjoergstructures are traversed in very similar ways. 248206f32e7eSjoerg 248306f32e7eSjoerg.. _iterate_function: 248406f32e7eSjoerg 248506f32e7eSjoergIterating over the ``BasicBlock`` in a ``Function`` 248606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 248706f32e7eSjoerg 248806f32e7eSjoergIt's quite common to have a ``Function`` instance that you'd like to transform 248906f32e7eSjoergin some way; in particular, you'd like to manipulate its ``BasicBlock``\ s. To 249006f32e7eSjoergfacilitate this, you'll need to iterate over all of the ``BasicBlock``\ s that 249106f32e7eSjoergconstitute the ``Function``. The following is an example that prints the name 249206f32e7eSjoergof a ``BasicBlock`` and the number of ``Instruction``\ s it contains: 249306f32e7eSjoerg 249406f32e7eSjoerg.. code-block:: c++ 249506f32e7eSjoerg 249606f32e7eSjoerg Function &Func = ... 249706f32e7eSjoerg for (BasicBlock &BB : Func) 249806f32e7eSjoerg // Print out the name of the basic block if it has one, and then the 249906f32e7eSjoerg // number of instructions that it contains 250006f32e7eSjoerg errs() << "Basic block (name=" << BB.getName() << ") has " 250106f32e7eSjoerg << BB.size() << " instructions.\n"; 250206f32e7eSjoerg 250306f32e7eSjoerg.. _iterate_basicblock: 250406f32e7eSjoerg 250506f32e7eSjoergIterating over the ``Instruction`` in a ``BasicBlock`` 250606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 250706f32e7eSjoerg 250806f32e7eSjoergJust like when dealing with ``BasicBlock``\ s in ``Function``\ s, it's easy to 250906f32e7eSjoergiterate over the individual instructions that make up ``BasicBlock``\ s. Here's 251006f32e7eSjoerga code snippet that prints out each instruction in a ``BasicBlock``: 251106f32e7eSjoerg 251206f32e7eSjoerg.. code-block:: c++ 251306f32e7eSjoerg 251406f32e7eSjoerg BasicBlock& BB = ... 251506f32e7eSjoerg for (Instruction &I : BB) 251606f32e7eSjoerg // The next statement works since operator<<(ostream&,...) 251706f32e7eSjoerg // is overloaded for Instruction& 251806f32e7eSjoerg errs() << I << "\n"; 251906f32e7eSjoerg 252006f32e7eSjoerg 252106f32e7eSjoergHowever, this isn't really the best way to print out the contents of a 252206f32e7eSjoerg``BasicBlock``! Since the ostream operators are overloaded for virtually 252306f32e7eSjoerganything you'll care about, you could have just invoked the print routine on the 252406f32e7eSjoergbasic block itself: ``errs() << BB << "\n";``. 252506f32e7eSjoerg 252606f32e7eSjoerg.. _iterate_insiter: 252706f32e7eSjoerg 252806f32e7eSjoergIterating over the ``Instruction`` in a ``Function`` 252906f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 253006f32e7eSjoerg 253106f32e7eSjoergIf you're finding that you commonly iterate over a ``Function``'s 253206f32e7eSjoerg``BasicBlock``\ s and then that ``BasicBlock``'s ``Instruction``\ s, 253306f32e7eSjoerg``InstIterator`` should be used instead. You'll need to include 253406f32e7eSjoerg``llvm/IR/InstIterator.h`` (`doxygen 2535*da58b97aSjoerg<https://llvm.org/doxygen/InstIterator_8h.html>`__) and then instantiate 253606f32e7eSjoerg``InstIterator``\ s explicitly in your code. Here's a small example that shows 253706f32e7eSjoerghow to dump all instructions in a function to the standard error stream: 253806f32e7eSjoerg 253906f32e7eSjoerg.. code-block:: c++ 254006f32e7eSjoerg 254106f32e7eSjoerg #include "llvm/IR/InstIterator.h" 254206f32e7eSjoerg 254306f32e7eSjoerg // F is a pointer to a Function instance 254406f32e7eSjoerg for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) 254506f32e7eSjoerg errs() << *I << "\n"; 254606f32e7eSjoerg 254706f32e7eSjoergEasy, isn't it? You can also use ``InstIterator``\ s to fill a work list with 254806f32e7eSjoergits initial contents. For example, if you wanted to initialize a work list to 254906f32e7eSjoergcontain all instructions in a ``Function`` F, all you would need to do is 255006f32e7eSjoergsomething like: 255106f32e7eSjoerg 255206f32e7eSjoerg.. code-block:: c++ 255306f32e7eSjoerg 255406f32e7eSjoerg std::set<Instruction*> worklist; 255506f32e7eSjoerg // or better yet, SmallPtrSet<Instruction*, 64> worklist; 255606f32e7eSjoerg 255706f32e7eSjoerg for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) 255806f32e7eSjoerg worklist.insert(&*I); 255906f32e7eSjoerg 256006f32e7eSjoergThe STL set ``worklist`` would now contain all instructions in the ``Function`` 256106f32e7eSjoergpointed to by F. 256206f32e7eSjoerg 256306f32e7eSjoerg.. _iterate_convert: 256406f32e7eSjoerg 256506f32e7eSjoergTurning an iterator into a class pointer (and vice-versa) 256606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 256706f32e7eSjoerg 256806f32e7eSjoergSometimes, it'll be useful to grab a reference (or pointer) to a class instance 256906f32e7eSjoergwhen all you've got at hand is an iterator. Well, extracting a reference or a 257006f32e7eSjoergpointer from an iterator is very straight-forward. Assuming that ``i`` is a 257106f32e7eSjoerg``BasicBlock::iterator`` and ``j`` is a ``BasicBlock::const_iterator``: 257206f32e7eSjoerg 257306f32e7eSjoerg.. code-block:: c++ 257406f32e7eSjoerg 257506f32e7eSjoerg Instruction& inst = *i; // Grab reference to instruction reference 257606f32e7eSjoerg Instruction* pinst = &*i; // Grab pointer to instruction reference 257706f32e7eSjoerg const Instruction& inst = *j; 257806f32e7eSjoerg 257906f32e7eSjoergHowever, the iterators you'll be working with in the LLVM framework are special: 258006f32e7eSjoergthey will automatically convert to a ptr-to-instance type whenever they need to. 258106f32e7eSjoergInstead of dereferencing the iterator and then taking the address of the result, 258206f32e7eSjoergyou can simply assign the iterator to the proper pointer type and you get the 258306f32e7eSjoergdereference and address-of operation as a result of the assignment (behind the 258406f32e7eSjoergscenes, this is a result of overloading casting mechanisms). Thus the second 258506f32e7eSjoergline of the last example, 258606f32e7eSjoerg 258706f32e7eSjoerg.. code-block:: c++ 258806f32e7eSjoerg 258906f32e7eSjoerg Instruction *pinst = &*i; 259006f32e7eSjoerg 259106f32e7eSjoergis semantically equivalent to 259206f32e7eSjoerg 259306f32e7eSjoerg.. code-block:: c++ 259406f32e7eSjoerg 259506f32e7eSjoerg Instruction *pinst = i; 259606f32e7eSjoerg 259706f32e7eSjoergIt's also possible to turn a class pointer into the corresponding iterator, and 259806f32e7eSjoergthis is a constant time operation (very efficient). The following code snippet 259906f32e7eSjoergillustrates use of the conversion constructors provided by LLVM iterators. By 260006f32e7eSjoergusing these, you can explicitly grab the iterator of something without actually 260106f32e7eSjoergobtaining it via iteration over some structure: 260206f32e7eSjoerg 260306f32e7eSjoerg.. code-block:: c++ 260406f32e7eSjoerg 260506f32e7eSjoerg void printNextInstruction(Instruction* inst) { 260606f32e7eSjoerg BasicBlock::iterator it(inst); 260706f32e7eSjoerg ++it; // After this line, it refers to the instruction after *inst 260806f32e7eSjoerg if (it != inst->getParent()->end()) errs() << *it << "\n"; 260906f32e7eSjoerg } 261006f32e7eSjoerg 261106f32e7eSjoergUnfortunately, these implicit conversions come at a cost; they prevent these 261206f32e7eSjoergiterators from conforming to standard iterator conventions, and thus from being 261306f32e7eSjoergusable with standard algorithms and containers. For example, they prevent the 261406f32e7eSjoergfollowing code, where ``B`` is a ``BasicBlock``, from compiling: 261506f32e7eSjoerg 261606f32e7eSjoerg.. code-block:: c++ 261706f32e7eSjoerg 261806f32e7eSjoerg llvm::SmallVector<llvm::Instruction *, 16>(B->begin(), B->end()); 261906f32e7eSjoerg 262006f32e7eSjoergBecause of this, these implicit conversions may be removed some day, and 262106f32e7eSjoerg``operator*`` changed to return a pointer instead of a reference. 262206f32e7eSjoerg 262306f32e7eSjoerg.. _iterate_complex: 262406f32e7eSjoerg 262506f32e7eSjoergFinding call sites: a slightly more complex example 262606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 262706f32e7eSjoerg 262806f32e7eSjoergSay that you're writing a FunctionPass and would like to count all the locations 262906f32e7eSjoergin the entire module (that is, across every ``Function``) where a certain 263006f32e7eSjoergfunction (i.e., some ``Function *``) is already in scope. As you'll learn 263106f32e7eSjoerglater, you may want to use an ``InstVisitor`` to accomplish this in a much more 263206f32e7eSjoergstraight-forward manner, but this example will allow us to explore how you'd do 263306f32e7eSjoergit if you didn't have ``InstVisitor`` around. In pseudo-code, this is what we 263406f32e7eSjoergwant to do: 263506f32e7eSjoerg 263606f32e7eSjoerg.. code-block:: none 263706f32e7eSjoerg 263806f32e7eSjoerg initialize callCounter to zero 263906f32e7eSjoerg for each Function f in the Module 264006f32e7eSjoerg for each BasicBlock b in f 264106f32e7eSjoerg for each Instruction i in b 2642*da58b97aSjoerg if (i a Call and calls the given function) 264306f32e7eSjoerg increment callCounter 264406f32e7eSjoerg 264506f32e7eSjoergAnd the actual code is (remember, because we're writing a ``FunctionPass``, our 264606f32e7eSjoerg``FunctionPass``-derived class simply has to override the ``runOnFunction`` 264706f32e7eSjoergmethod): 264806f32e7eSjoerg 264906f32e7eSjoerg.. code-block:: c++ 265006f32e7eSjoerg 265106f32e7eSjoerg Function* targetFunc = ...; 265206f32e7eSjoerg 265306f32e7eSjoerg class OurFunctionPass : public FunctionPass { 265406f32e7eSjoerg public: 265506f32e7eSjoerg OurFunctionPass(): callCounter(0) { } 265606f32e7eSjoerg 265706f32e7eSjoerg virtual runOnFunction(Function& F) { 265806f32e7eSjoerg for (BasicBlock &B : F) { 265906f32e7eSjoerg for (Instruction &I: B) { 2660*da58b97aSjoerg if (auto *CB = dyn_cast<CallBase>(&I)) { 2661*da58b97aSjoerg // We know we've encountered some kind of call instruction (call, 2662*da58b97aSjoerg // invoke, or callbr), so we need to determine if it's a call to 2663*da58b97aSjoerg // the function pointed to by m_func or not. 2664*da58b97aSjoerg if (CB->getCalledFunction() == targetFunc) 266506f32e7eSjoerg ++callCounter; 266606f32e7eSjoerg } 266706f32e7eSjoerg } 266806f32e7eSjoerg } 266906f32e7eSjoerg } 267006f32e7eSjoerg 267106f32e7eSjoerg private: 267206f32e7eSjoerg unsigned callCounter; 267306f32e7eSjoerg }; 267406f32e7eSjoerg 267506f32e7eSjoerg.. _iterate_chains: 267606f32e7eSjoerg 267706f32e7eSjoergIterating over def-use & use-def chains 267806f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 267906f32e7eSjoerg 268006f32e7eSjoergFrequently, we might have an instance of the ``Value`` class (`doxygen 2681*da58b97aSjoerg<https://llvm.org/doxygen/classllvm_1_1Value.html>`__) and we want to determine 268206f32e7eSjoergwhich ``User``\ s use the ``Value``. The list of all ``User``\ s of a particular 268306f32e7eSjoerg``Value`` is called a *def-use* chain. For example, let's say we have a 268406f32e7eSjoerg``Function*`` named ``F`` to a particular function ``foo``. Finding all of the 268506f32e7eSjoerginstructions that *use* ``foo`` is as simple as iterating over the *def-use* 268606f32e7eSjoergchain of ``F``: 268706f32e7eSjoerg 268806f32e7eSjoerg.. code-block:: c++ 268906f32e7eSjoerg 269006f32e7eSjoerg Function *F = ...; 269106f32e7eSjoerg 269206f32e7eSjoerg for (User *U : F->users()) { 269306f32e7eSjoerg if (Instruction *Inst = dyn_cast<Instruction>(U)) { 269406f32e7eSjoerg errs() << "F is used in instruction:\n"; 269506f32e7eSjoerg errs() << *Inst << "\n"; 269606f32e7eSjoerg } 269706f32e7eSjoerg 269806f32e7eSjoergAlternatively, it's common to have an instance of the ``User`` Class (`doxygen 2699*da58b97aSjoerg<https://llvm.org/doxygen/classllvm_1_1User.html>`__) and need to know what 270006f32e7eSjoerg``Value``\ s are used by it. The list of all ``Value``\ s used by a ``User`` is 270106f32e7eSjoergknown as a *use-def* chain. Instances of class ``Instruction`` are common 270206f32e7eSjoerg``User`` s, so we might want to iterate over all of the values that a particular 270306f32e7eSjoerginstruction uses (that is, the operands of the particular ``Instruction``): 270406f32e7eSjoerg 270506f32e7eSjoerg.. code-block:: c++ 270606f32e7eSjoerg 270706f32e7eSjoerg Instruction *pi = ...; 270806f32e7eSjoerg 270906f32e7eSjoerg for (Use &U : pi->operands()) { 271006f32e7eSjoerg Value *v = U.get(); 271106f32e7eSjoerg // ... 271206f32e7eSjoerg } 271306f32e7eSjoerg 271406f32e7eSjoergDeclaring objects as ``const`` is an important tool of enforcing mutation free 271506f32e7eSjoergalgorithms (such as analyses, etc.). For this purpose above iterators come in 271606f32e7eSjoergconstant flavors as ``Value::const_use_iterator`` and 271706f32e7eSjoerg``Value::const_op_iterator``. They automatically arise when calling 271806f32e7eSjoerg``use/op_begin()`` on ``const Value*``\ s or ``const User*``\ s respectively. 271906f32e7eSjoergUpon dereferencing, they return ``const Use*``\ s. Otherwise the above patterns 272006f32e7eSjoergremain unchanged. 272106f32e7eSjoerg 272206f32e7eSjoerg.. _iterate_preds: 272306f32e7eSjoerg 272406f32e7eSjoergIterating over predecessors & successors of blocks 272506f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 272606f32e7eSjoerg 272706f32e7eSjoergIterating over the predecessors and successors of a block is quite easy with the 272806f32e7eSjoergroutines defined in ``"llvm/IR/CFG.h"``. Just use code like this to 272906f32e7eSjoergiterate over all predecessors of BB: 273006f32e7eSjoerg 273106f32e7eSjoerg.. code-block:: c++ 273206f32e7eSjoerg 273306f32e7eSjoerg #include "llvm/IR/CFG.h" 273406f32e7eSjoerg BasicBlock *BB = ...; 273506f32e7eSjoerg 273606f32e7eSjoerg for (BasicBlock *Pred : predecessors(BB)) { 273706f32e7eSjoerg // ... 273806f32e7eSjoerg } 273906f32e7eSjoerg 274006f32e7eSjoergSimilarly, to iterate over successors use ``successors``. 274106f32e7eSjoerg 274206f32e7eSjoerg.. _simplechanges: 274306f32e7eSjoerg 274406f32e7eSjoergMaking simple changes 274506f32e7eSjoerg--------------------- 274606f32e7eSjoerg 274706f32e7eSjoergThere are some primitive transformation operations present in the LLVM 274806f32e7eSjoerginfrastructure that are worth knowing about. When performing transformations, 274906f32e7eSjoergit's fairly common to manipulate the contents of basic blocks. This section 275006f32e7eSjoergdescribes some of the common methods for doing so and gives example code. 275106f32e7eSjoerg 275206f32e7eSjoerg.. _schanges_creating: 275306f32e7eSjoerg 275406f32e7eSjoergCreating and inserting new ``Instruction``\ s 275506f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 275606f32e7eSjoerg 275706f32e7eSjoerg*Instantiating Instructions* 275806f32e7eSjoerg 275906f32e7eSjoergCreation of ``Instruction``\ s is straight-forward: simply call the constructor 276006f32e7eSjoergfor the kind of instruction to instantiate and provide the necessary parameters. 276106f32e7eSjoergFor example, an ``AllocaInst`` only *requires* a (const-ptr-to) ``Type``. Thus: 276206f32e7eSjoerg 276306f32e7eSjoerg.. code-block:: c++ 276406f32e7eSjoerg 276506f32e7eSjoerg auto *ai = new AllocaInst(Type::Int32Ty); 276606f32e7eSjoerg 276706f32e7eSjoergwill create an ``AllocaInst`` instance that represents the allocation of one 276806f32e7eSjoerginteger in the current stack frame, at run time. Each ``Instruction`` subclass 276906f32e7eSjoergis likely to have varying default parameters which change the semantics of the 277006f32e7eSjoerginstruction, so refer to the `doxygen documentation for the subclass of 2771*da58b97aSjoergInstruction <https://llvm.org/doxygen/classllvm_1_1Instruction.html>`_ that 277206f32e7eSjoergyou're interested in instantiating. 277306f32e7eSjoerg 277406f32e7eSjoerg*Naming values* 277506f32e7eSjoerg 277606f32e7eSjoergIt is very useful to name the values of instructions when you're able to, as 277706f32e7eSjoergthis facilitates the debugging of your transformations. If you end up looking 277806f32e7eSjoergat generated LLVM machine code, you definitely want to have logical names 277906f32e7eSjoergassociated with the results of instructions! By supplying a value for the 278006f32e7eSjoerg``Name`` (default) parameter of the ``Instruction`` constructor, you associate a 278106f32e7eSjoerglogical name with the result of the instruction's execution at run time. For 278206f32e7eSjoergexample, say that I'm writing a transformation that dynamically allocates space 278306f32e7eSjoergfor an integer on the stack, and that integer is going to be used as some kind 278406f32e7eSjoergof index by some other code. To accomplish this, I place an ``AllocaInst`` at 278506f32e7eSjoergthe first point in the first ``BasicBlock`` of some ``Function``, and I'm 278606f32e7eSjoergintending to use it within the same ``Function``. I might do: 278706f32e7eSjoerg 278806f32e7eSjoerg.. code-block:: c++ 278906f32e7eSjoerg 279006f32e7eSjoerg auto *pa = new AllocaInst(Type::Int32Ty, 0, "indexLoc"); 279106f32e7eSjoerg 279206f32e7eSjoergwhere ``indexLoc`` is now the logical name of the instruction's execution value, 279306f32e7eSjoergwhich is a pointer to an integer on the run time stack. 279406f32e7eSjoerg 279506f32e7eSjoerg*Inserting instructions* 279606f32e7eSjoerg 279706f32e7eSjoergThere are essentially three ways to insert an ``Instruction`` into an existing 279806f32e7eSjoergsequence of instructions that form a ``BasicBlock``: 279906f32e7eSjoerg 280006f32e7eSjoerg* Insertion into an explicit instruction list 280106f32e7eSjoerg 280206f32e7eSjoerg Given a ``BasicBlock* pb``, an ``Instruction* pi`` within that ``BasicBlock``, 280306f32e7eSjoerg and a newly-created instruction we wish to insert before ``*pi``, we do the 280406f32e7eSjoerg following: 280506f32e7eSjoerg 280606f32e7eSjoerg .. code-block:: c++ 280706f32e7eSjoerg 280806f32e7eSjoerg BasicBlock *pb = ...; 280906f32e7eSjoerg Instruction *pi = ...; 281006f32e7eSjoerg auto *newInst = new Instruction(...); 281106f32e7eSjoerg 281206f32e7eSjoerg pb->getInstList().insert(pi, newInst); // Inserts newInst before pi in pb 281306f32e7eSjoerg 281406f32e7eSjoerg Appending to the end of a ``BasicBlock`` is so common that the ``Instruction`` 281506f32e7eSjoerg class and ``Instruction``-derived classes provide constructors which take a 281606f32e7eSjoerg pointer to a ``BasicBlock`` to be appended to. For example code that looked 281706f32e7eSjoerg like: 281806f32e7eSjoerg 281906f32e7eSjoerg .. code-block:: c++ 282006f32e7eSjoerg 282106f32e7eSjoerg BasicBlock *pb = ...; 282206f32e7eSjoerg auto *newInst = new Instruction(...); 282306f32e7eSjoerg 282406f32e7eSjoerg pb->getInstList().push_back(newInst); // Appends newInst to pb 282506f32e7eSjoerg 282606f32e7eSjoerg becomes: 282706f32e7eSjoerg 282806f32e7eSjoerg .. code-block:: c++ 282906f32e7eSjoerg 283006f32e7eSjoerg BasicBlock *pb = ...; 283106f32e7eSjoerg auto *newInst = new Instruction(..., pb); 283206f32e7eSjoerg 283306f32e7eSjoerg which is much cleaner, especially if you are creating long instruction 283406f32e7eSjoerg streams. 283506f32e7eSjoerg 283606f32e7eSjoerg* Insertion into an implicit instruction list 283706f32e7eSjoerg 283806f32e7eSjoerg ``Instruction`` instances that are already in ``BasicBlock``\ s are implicitly 283906f32e7eSjoerg associated with an existing instruction list: the instruction list of the 284006f32e7eSjoerg enclosing basic block. Thus, we could have accomplished the same thing as the 284106f32e7eSjoerg above code without being given a ``BasicBlock`` by doing: 284206f32e7eSjoerg 284306f32e7eSjoerg .. code-block:: c++ 284406f32e7eSjoerg 284506f32e7eSjoerg Instruction *pi = ...; 284606f32e7eSjoerg auto *newInst = new Instruction(...); 284706f32e7eSjoerg 284806f32e7eSjoerg pi->getParent()->getInstList().insert(pi, newInst); 284906f32e7eSjoerg 285006f32e7eSjoerg In fact, this sequence of steps occurs so frequently that the ``Instruction`` 285106f32e7eSjoerg class and ``Instruction``-derived classes provide constructors which take (as 285206f32e7eSjoerg a default parameter) a pointer to an ``Instruction`` which the newly-created 285306f32e7eSjoerg ``Instruction`` should precede. That is, ``Instruction`` constructors are 285406f32e7eSjoerg capable of inserting the newly-created instance into the ``BasicBlock`` of a 285506f32e7eSjoerg provided instruction, immediately before that instruction. Using an 285606f32e7eSjoerg ``Instruction`` constructor with a ``insertBefore`` (default) parameter, the 285706f32e7eSjoerg above code becomes: 285806f32e7eSjoerg 285906f32e7eSjoerg .. code-block:: c++ 286006f32e7eSjoerg 286106f32e7eSjoerg Instruction* pi = ...; 286206f32e7eSjoerg auto *newInst = new Instruction(..., pi); 286306f32e7eSjoerg 286406f32e7eSjoerg which is much cleaner, especially if you're creating a lot of instructions and 286506f32e7eSjoerg adding them to ``BasicBlock``\ s. 286606f32e7eSjoerg 286706f32e7eSjoerg* Insertion using an instance of ``IRBuilder`` 286806f32e7eSjoerg 286906f32e7eSjoerg Inserting several ``Instruction``\ s can be quite laborious using the previous 287006f32e7eSjoerg methods. The ``IRBuilder`` is a convenience class that can be used to add 287106f32e7eSjoerg several instructions to the end of a ``BasicBlock`` or before a particular 287206f32e7eSjoerg ``Instruction``. It also supports constant folding and renaming named 287306f32e7eSjoerg registers (see ``IRBuilder``'s template arguments). 287406f32e7eSjoerg 287506f32e7eSjoerg The example below demonstrates a very simple use of the ``IRBuilder`` where 287606f32e7eSjoerg three instructions are inserted before the instruction ``pi``. The first two 287706f32e7eSjoerg instructions are Call instructions and third instruction multiplies the return 287806f32e7eSjoerg value of the two calls. 287906f32e7eSjoerg 288006f32e7eSjoerg .. code-block:: c++ 288106f32e7eSjoerg 288206f32e7eSjoerg Instruction *pi = ...; 288306f32e7eSjoerg IRBuilder<> Builder(pi); 288406f32e7eSjoerg CallInst* callOne = Builder.CreateCall(...); 288506f32e7eSjoerg CallInst* callTwo = Builder.CreateCall(...); 288606f32e7eSjoerg Value* result = Builder.CreateMul(callOne, callTwo); 288706f32e7eSjoerg 288806f32e7eSjoerg The example below is similar to the above example except that the created 288906f32e7eSjoerg ``IRBuilder`` inserts instructions at the end of the ``BasicBlock`` ``pb``. 289006f32e7eSjoerg 289106f32e7eSjoerg .. code-block:: c++ 289206f32e7eSjoerg 289306f32e7eSjoerg BasicBlock *pb = ...; 289406f32e7eSjoerg IRBuilder<> Builder(pb); 289506f32e7eSjoerg CallInst* callOne = Builder.CreateCall(...); 289606f32e7eSjoerg CallInst* callTwo = Builder.CreateCall(...); 289706f32e7eSjoerg Value* result = Builder.CreateMul(callOne, callTwo); 289806f32e7eSjoerg 289906f32e7eSjoerg See :doc:`tutorial/LangImpl03` for a practical use of the ``IRBuilder``. 290006f32e7eSjoerg 290106f32e7eSjoerg 290206f32e7eSjoerg.. _schanges_deleting: 290306f32e7eSjoerg 290406f32e7eSjoergDeleting Instructions 290506f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^ 290606f32e7eSjoerg 290706f32e7eSjoergDeleting an instruction from an existing sequence of instructions that form a 290806f32e7eSjoergBasicBlock_ is very straight-forward: just call the instruction's 290906f32e7eSjoerg``eraseFromParent()`` method. For example: 291006f32e7eSjoerg 291106f32e7eSjoerg.. code-block:: c++ 291206f32e7eSjoerg 291306f32e7eSjoerg Instruction *I = .. ; 291406f32e7eSjoerg I->eraseFromParent(); 291506f32e7eSjoerg 291606f32e7eSjoergThis unlinks the instruction from its containing basic block and deletes it. If 291706f32e7eSjoergyou'd just like to unlink the instruction from its containing basic block but 291806f32e7eSjoergnot delete it, you can use the ``removeFromParent()`` method. 291906f32e7eSjoerg 292006f32e7eSjoerg.. _schanges_replacing: 292106f32e7eSjoerg 292206f32e7eSjoergReplacing an Instruction with another Value 292306f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 292406f32e7eSjoerg 292506f32e7eSjoergReplacing individual instructions 292606f32e7eSjoerg""""""""""""""""""""""""""""""""" 292706f32e7eSjoerg 292806f32e7eSjoergIncluding "`llvm/Transforms/Utils/BasicBlockUtils.h 2929*da58b97aSjoerg<https://llvm.org/doxygen/BasicBlockUtils_8h_source.html>`_" permits use of two 293006f32e7eSjoergvery useful replace functions: ``ReplaceInstWithValue`` and 293106f32e7eSjoerg``ReplaceInstWithInst``. 293206f32e7eSjoerg 293306f32e7eSjoerg.. _schanges_deleting_sub: 293406f32e7eSjoerg 293506f32e7eSjoergDeleting Instructions 293606f32e7eSjoerg""""""""""""""""""""" 293706f32e7eSjoerg 293806f32e7eSjoerg* ``ReplaceInstWithValue`` 293906f32e7eSjoerg 294006f32e7eSjoerg This function replaces all uses of a given instruction with a value, and then 294106f32e7eSjoerg removes the original instruction. The following example illustrates the 294206f32e7eSjoerg replacement of the result of a particular ``AllocaInst`` that allocates memory 294306f32e7eSjoerg for a single integer with a null pointer to an integer. 294406f32e7eSjoerg 294506f32e7eSjoerg .. code-block:: c++ 294606f32e7eSjoerg 294706f32e7eSjoerg AllocaInst* instToReplace = ...; 294806f32e7eSjoerg BasicBlock::iterator ii(instToReplace); 294906f32e7eSjoerg 295006f32e7eSjoerg ReplaceInstWithValue(instToReplace->getParent()->getInstList(), ii, 295106f32e7eSjoerg Constant::getNullValue(PointerType::getUnqual(Type::Int32Ty))); 295206f32e7eSjoerg 295306f32e7eSjoerg* ``ReplaceInstWithInst`` 295406f32e7eSjoerg 295506f32e7eSjoerg This function replaces a particular instruction with another instruction, 295606f32e7eSjoerg inserting the new instruction into the basic block at the location where the 295706f32e7eSjoerg old instruction was, and replacing any uses of the old instruction with the 295806f32e7eSjoerg new instruction. The following example illustrates the replacement of one 295906f32e7eSjoerg ``AllocaInst`` with another. 296006f32e7eSjoerg 296106f32e7eSjoerg .. code-block:: c++ 296206f32e7eSjoerg 296306f32e7eSjoerg AllocaInst* instToReplace = ...; 296406f32e7eSjoerg BasicBlock::iterator ii(instToReplace); 296506f32e7eSjoerg 296606f32e7eSjoerg ReplaceInstWithInst(instToReplace->getParent()->getInstList(), ii, 296706f32e7eSjoerg new AllocaInst(Type::Int32Ty, 0, "ptrToReplacedInt")); 296806f32e7eSjoerg 296906f32e7eSjoerg 297006f32e7eSjoergReplacing multiple uses of Users and Values 297106f32e7eSjoerg""""""""""""""""""""""""""""""""""""""""""" 297206f32e7eSjoerg 297306f32e7eSjoergYou can use ``Value::replaceAllUsesWith`` and ``User::replaceUsesOfWith`` to 297406f32e7eSjoergchange more than one use at a time. See the doxygen documentation for the 2975*da58b97aSjoerg`Value Class <https://llvm.org/doxygen/classllvm_1_1Value.html>`_ and `User Class 2976*da58b97aSjoerg<https://llvm.org/doxygen/classllvm_1_1User.html>`_, respectively, for more 297706f32e7eSjoerginformation. 297806f32e7eSjoerg 297906f32e7eSjoerg.. _schanges_deletingGV: 298006f32e7eSjoerg 298106f32e7eSjoergDeleting GlobalVariables 298206f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^ 298306f32e7eSjoerg 298406f32e7eSjoergDeleting a global variable from a module is just as easy as deleting an 298506f32e7eSjoergInstruction. First, you must have a pointer to the global variable that you 298606f32e7eSjoergwish to delete. You use this pointer to erase it from its parent, the module. 298706f32e7eSjoergFor example: 298806f32e7eSjoerg 298906f32e7eSjoerg.. code-block:: c++ 299006f32e7eSjoerg 299106f32e7eSjoerg GlobalVariable *GV = .. ; 299206f32e7eSjoerg 299306f32e7eSjoerg GV->eraseFromParent(); 299406f32e7eSjoerg 299506f32e7eSjoerg 299606f32e7eSjoerg.. _threading: 299706f32e7eSjoerg 299806f32e7eSjoergThreads and LLVM 299906f32e7eSjoerg================ 300006f32e7eSjoerg 300106f32e7eSjoergThis section describes the interaction of the LLVM APIs with multithreading, 300206f32e7eSjoergboth on the part of client applications, and in the JIT, in the hosted 300306f32e7eSjoergapplication. 300406f32e7eSjoerg 300506f32e7eSjoergNote that LLVM's support for multithreading is still relatively young. Up 300606f32e7eSjoergthrough version 2.5, the execution of threaded hosted applications was 300706f32e7eSjoergsupported, but not threaded client access to the APIs. While this use case is 300806f32e7eSjoergnow supported, clients *must* adhere to the guidelines specified below to ensure 300906f32e7eSjoergproper operation in multithreaded mode. 301006f32e7eSjoerg 301106f32e7eSjoergNote that, on Unix-like platforms, LLVM requires the presence of GCC's atomic 301206f32e7eSjoergintrinsics in order to support threaded operation. If you need a 3013*da58b97aSjoergmultithreading-capable LLVM on a platform without a suitably modern system 301406f32e7eSjoergcompiler, consider compiling LLVM and LLVM-GCC in single-threaded mode, and 301506f32e7eSjoergusing the resultant compiler to build a copy of LLVM with multithreading 301606f32e7eSjoergsupport. 301706f32e7eSjoerg 301806f32e7eSjoerg.. _shutdown: 301906f32e7eSjoerg 302006f32e7eSjoergEnding Execution with ``llvm_shutdown()`` 302106f32e7eSjoerg----------------------------------------- 302206f32e7eSjoerg 302306f32e7eSjoergWhen you are done using the LLVM APIs, you should call ``llvm_shutdown()`` to 302406f32e7eSjoergdeallocate memory used for internal structures. 302506f32e7eSjoerg 302606f32e7eSjoerg.. _managedstatic: 302706f32e7eSjoerg 302806f32e7eSjoergLazy Initialization with ``ManagedStatic`` 302906f32e7eSjoerg------------------------------------------ 303006f32e7eSjoerg 303106f32e7eSjoerg``ManagedStatic`` is a utility class in LLVM used to implement static 303206f32e7eSjoerginitialization of static resources, such as the global type tables. In a 303306f32e7eSjoergsingle-threaded environment, it implements a simple lazy initialization scheme. 303406f32e7eSjoergWhen LLVM is compiled with support for multi-threading, however, it uses 303506f32e7eSjoergdouble-checked locking to implement thread-safe lazy initialization. 303606f32e7eSjoerg 303706f32e7eSjoerg.. _llvmcontext: 303806f32e7eSjoerg 303906f32e7eSjoergAchieving Isolation with ``LLVMContext`` 304006f32e7eSjoerg---------------------------------------- 304106f32e7eSjoerg 304206f32e7eSjoerg``LLVMContext`` is an opaque class in the LLVM API which clients can use to 304306f32e7eSjoergoperate multiple, isolated instances of LLVM concurrently within the same 304406f32e7eSjoergaddress space. For instance, in a hypothetical compile-server, the compilation 304506f32e7eSjoergof an individual translation unit is conceptually independent from all the 304606f32e7eSjoergothers, and it would be desirable to be able to compile incoming translation 304706f32e7eSjoergunits concurrently on independent server threads. Fortunately, ``LLVMContext`` 304806f32e7eSjoergexists to enable just this kind of scenario! 304906f32e7eSjoerg 305006f32e7eSjoergConceptually, ``LLVMContext`` provides isolation. Every LLVM entity 305106f32e7eSjoerg(``Module``\ s, ``Value``\ s, ``Type``\ s, ``Constant``\ s, etc.) in LLVM's 305206f32e7eSjoergin-memory IR belongs to an ``LLVMContext``. Entities in different contexts 305306f32e7eSjoerg*cannot* interact with each other: ``Module``\ s in different contexts cannot be 305406f32e7eSjoerglinked together, ``Function``\ s cannot be added to ``Module``\ s in different 305506f32e7eSjoergcontexts, etc. What this means is that is safe to compile on multiple 305606f32e7eSjoergthreads simultaneously, as long as no two threads operate on entities within the 305706f32e7eSjoergsame context. 305806f32e7eSjoerg 305906f32e7eSjoergIn practice, very few places in the API require the explicit specification of a 306006f32e7eSjoerg``LLVMContext``, other than the ``Type`` creation/lookup APIs. Because every 306106f32e7eSjoerg``Type`` carries a reference to its owning context, most other entities can 306206f32e7eSjoergdetermine what context they belong to by looking at their own ``Type``. If you 306306f32e7eSjoergare adding new entities to LLVM IR, please try to maintain this interface 306406f32e7eSjoergdesign. 306506f32e7eSjoerg 306606f32e7eSjoerg.. _jitthreading: 306706f32e7eSjoerg 306806f32e7eSjoergThreads and the JIT 306906f32e7eSjoerg------------------- 307006f32e7eSjoerg 307106f32e7eSjoergLLVM's "eager" JIT compiler is safe to use in threaded programs. Multiple 307206f32e7eSjoergthreads can call ``ExecutionEngine::getPointerToFunction()`` or 307306f32e7eSjoerg``ExecutionEngine::runFunction()`` concurrently, and multiple threads can run 307406f32e7eSjoergcode output by the JIT concurrently. The user must still ensure that only one 307506f32e7eSjoergthread accesses IR in a given ``LLVMContext`` while another thread might be 307606f32e7eSjoergmodifying it. One way to do that is to always hold the JIT lock while accessing 307706f32e7eSjoergIR outside the JIT (the JIT *modifies* the IR by adding ``CallbackVH``\ s). 307806f32e7eSjoergAnother way is to only call ``getPointerToFunction()`` from the 307906f32e7eSjoerg``LLVMContext``'s thread. 308006f32e7eSjoerg 308106f32e7eSjoergWhen the JIT is configured to compile lazily (using 308206f32e7eSjoerg``ExecutionEngine::DisableLazyCompilation(false)``), there is currently a `race 308306f32e7eSjoergcondition <https://bugs.llvm.org/show_bug.cgi?id=5184>`_ in updating call sites 308406f32e7eSjoergafter a function is lazily-jitted. It's still possible to use the lazy JIT in a 308506f32e7eSjoergthreaded program if you ensure that only one thread at a time can call any 308606f32e7eSjoergparticular lazy stub and that the JIT lock guards any IR access, but we suggest 308706f32e7eSjoergusing only the eager JIT in threaded programs. 308806f32e7eSjoerg 308906f32e7eSjoerg.. _advanced: 309006f32e7eSjoerg 309106f32e7eSjoergAdvanced Topics 309206f32e7eSjoerg=============== 309306f32e7eSjoerg 309406f32e7eSjoergThis section describes some of the advanced or obscure API's that most clients 309506f32e7eSjoergdo not need to be aware of. These API's tend manage the inner workings of the 309606f32e7eSjoergLLVM system, and only need to be accessed in unusual circumstances. 309706f32e7eSjoerg 309806f32e7eSjoerg.. _SymbolTable: 309906f32e7eSjoerg 310006f32e7eSjoergThe ``ValueSymbolTable`` class 310106f32e7eSjoerg------------------------------ 310206f32e7eSjoerg 310306f32e7eSjoergThe ``ValueSymbolTable`` (`doxygen 3104*da58b97aSjoerg<https://llvm.org/doxygen/classllvm_1_1ValueSymbolTable.html>`__) class provides 310506f32e7eSjoerga symbol table that the :ref:`Function <c_Function>` and Module_ classes use for 310606f32e7eSjoergnaming value definitions. The symbol table can provide a name for any Value_. 310706f32e7eSjoerg 310806f32e7eSjoergNote that the ``SymbolTable`` class should not be directly accessed by most 310906f32e7eSjoergclients. It should only be used when iteration over the symbol table names 311006f32e7eSjoergthemselves are required, which is very special purpose. Note that not all LLVM 311106f32e7eSjoergValue_\ s have names, and those without names (i.e. they have an empty name) do 311206f32e7eSjoergnot exist in the symbol table. 311306f32e7eSjoerg 311406f32e7eSjoergSymbol tables support iteration over the values in the symbol table with 311506f32e7eSjoerg``begin/end/iterator`` and supports querying to see if a specific name is in the 311606f32e7eSjoergsymbol table (with ``lookup``). The ``ValueSymbolTable`` class exposes no 311706f32e7eSjoergpublic mutator methods, instead, simply call ``setName`` on a value, which will 311806f32e7eSjoergautoinsert it into the appropriate symbol table. 311906f32e7eSjoerg 312006f32e7eSjoerg.. _UserLayout: 312106f32e7eSjoerg 312206f32e7eSjoergThe ``User`` and owned ``Use`` classes' memory layout 312306f32e7eSjoerg----------------------------------------------------- 312406f32e7eSjoerg 3125*da58b97aSjoergThe ``User`` (`doxygen <https://llvm.org/doxygen/classllvm_1_1User.html>`__) 312606f32e7eSjoergclass provides a basis for expressing the ownership of ``User`` towards other 3127*da58b97aSjoerg`Value instance <https://llvm.org/doxygen/classllvm_1_1Value.html>`_\ s. The 3128*da58b97aSjoerg``Use`` (`doxygen <https://llvm.org/doxygen/classllvm_1_1Use.html>`__) helper 312906f32e7eSjoergclass is employed to do the bookkeeping and to facilitate *O(1)* addition and 313006f32e7eSjoergremoval. 313106f32e7eSjoerg 313206f32e7eSjoerg.. _Use2User: 313306f32e7eSjoerg 313406f32e7eSjoergInteraction and relationship between ``User`` and ``Use`` objects 313506f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 313606f32e7eSjoerg 313706f32e7eSjoergA subclass of ``User`` can choose between incorporating its ``Use`` objects or 313806f32e7eSjoergrefer to them out-of-line by means of a pointer. A mixed variant (some ``Use`` 313906f32e7eSjoergs inline others hung off) is impractical and breaks the invariant that the 314006f32e7eSjoerg``Use`` objects belonging to the same ``User`` form a contiguous array. 314106f32e7eSjoerg 314206f32e7eSjoergWe have 2 different layouts in the ``User`` (sub)classes: 314306f32e7eSjoerg 314406f32e7eSjoerg* Layout a) 314506f32e7eSjoerg 314606f32e7eSjoerg The ``Use`` object(s) are inside (resp. at fixed offset) of the ``User`` 314706f32e7eSjoerg object and there are a fixed number of them. 314806f32e7eSjoerg 314906f32e7eSjoerg* Layout b) 315006f32e7eSjoerg 315106f32e7eSjoerg The ``Use`` object(s) are referenced by a pointer to an array from the 315206f32e7eSjoerg ``User`` object and there may be a variable number of them. 315306f32e7eSjoerg 315406f32e7eSjoergAs of v2.4 each layout still possesses a direct pointer to the start of the 315506f32e7eSjoergarray of ``Use``\ s. Though not mandatory for layout a), we stick to this 315606f32e7eSjoergredundancy for the sake of simplicity. The ``User`` object also stores the 315706f32e7eSjoergnumber of ``Use`` objects it has. (Theoretically this information can also be 315806f32e7eSjoergcalculated given the scheme presented below.) 315906f32e7eSjoerg 316006f32e7eSjoergSpecial forms of allocation operators (``operator new``) enforce the following 316106f32e7eSjoergmemory layouts: 316206f32e7eSjoerg 316306f32e7eSjoerg* Layout a) is modelled by prepending the ``User`` object by the ``Use[]`` 316406f32e7eSjoerg array. 316506f32e7eSjoerg 316606f32e7eSjoerg .. code-block:: none 316706f32e7eSjoerg 316806f32e7eSjoerg ...---.---.---.---.-------... 316906f32e7eSjoerg | P | P | P | P | User 317006f32e7eSjoerg '''---'---'---'---'-------''' 317106f32e7eSjoerg 317206f32e7eSjoerg* Layout b) is modelled by pointing at the ``Use[]`` array. 317306f32e7eSjoerg 317406f32e7eSjoerg .. code-block:: none 317506f32e7eSjoerg 317606f32e7eSjoerg .-------... 317706f32e7eSjoerg | User 317806f32e7eSjoerg '-------''' 317906f32e7eSjoerg | 318006f32e7eSjoerg v 318106f32e7eSjoerg .---.---.---.---... 318206f32e7eSjoerg | P | P | P | P | 318306f32e7eSjoerg '---'---'---'---''' 318406f32e7eSjoerg 318506f32e7eSjoerg*(In the above figures* '``P``' *stands for the* ``Use**`` *that is stored in 318606f32e7eSjoergeach* ``Use`` *object in the member* ``Use::Prev`` *)* 318706f32e7eSjoerg 318806f32e7eSjoerg.. _polymorphism: 318906f32e7eSjoerg 3190*da58b97aSjoergDesigning Type Hierarchies and Polymorphic Interfaces 319106f32e7eSjoerg----------------------------------------------------- 319206f32e7eSjoerg 319306f32e7eSjoergThere are two different design patterns that tend to result in the use of 319406f32e7eSjoergvirtual dispatch for methods in a type hierarchy in C++ programs. The first is 319506f32e7eSjoerga genuine type hierarchy where different types in the hierarchy model 319606f32e7eSjoerga specific subset of the functionality and semantics, and these types nest 319706f32e7eSjoergstrictly within each other. Good examples of this can be seen in the ``Value`` 319806f32e7eSjoergor ``Type`` type hierarchies. 319906f32e7eSjoerg 320006f32e7eSjoergA second is the desire to dispatch dynamically across a collection of 320106f32e7eSjoergpolymorphic interface implementations. This latter use case can be modeled with 320206f32e7eSjoergvirtual dispatch and inheritance by defining an abstract interface base class 320306f32e7eSjoergwhich all implementations derive from and override. However, this 320406f32e7eSjoergimplementation strategy forces an **"is-a"** relationship to exist that is not 320506f32e7eSjoergactually meaningful. There is often not some nested hierarchy of useful 320606f32e7eSjoerggeneralizations which code might interact with and move up and down. Instead, 320706f32e7eSjoergthere is a singular interface which is dispatched across a range of 320806f32e7eSjoergimplementations. 320906f32e7eSjoerg 321006f32e7eSjoergThe preferred implementation strategy for the second use case is that of 321106f32e7eSjoerggeneric programming (sometimes called "compile-time duck typing" or "static 321206f32e7eSjoergpolymorphism"). For example, a template over some type parameter ``T`` can be 321306f32e7eSjoerginstantiated across any particular implementation that conforms to the 321406f32e7eSjoerginterface or *concept*. A good example here is the highly generic properties of 321506f32e7eSjoergany type which models a node in a directed graph. LLVM models these primarily 321606f32e7eSjoergthrough templates and generic programming. Such templates include the 321706f32e7eSjoerg``LoopInfoBase`` and ``DominatorTreeBase``. When this type of polymorphism 321806f32e7eSjoergtruly needs **dynamic** dispatch you can generalize it using a technique 321906f32e7eSjoergcalled *concept-based polymorphism*. This pattern emulates the interfaces and 322006f32e7eSjoergbehaviors of templates using a very limited form of virtual dispatch for type 322106f32e7eSjoergerasure inside its implementation. You can find examples of this technique in 322206f32e7eSjoergthe ``PassManager.h`` system, and there is a more detailed introduction to it 322306f32e7eSjoergby Sean Parent in several of his talks and papers: 322406f32e7eSjoerg 322506f32e7eSjoerg#. `Inheritance Is The Base Class of Evil 322606f32e7eSjoerg <http://channel9.msdn.com/Events/GoingNative/2013/Inheritance-Is-The-Base-Class-of-Evil>`_ 322706f32e7eSjoerg - The GoingNative 2013 talk describing this technique, and probably the best 322806f32e7eSjoerg place to start. 322906f32e7eSjoerg#. `Value Semantics and Concepts-based Polymorphism 323006f32e7eSjoerg <http://www.youtube.com/watch?v=_BpMYeUFXv8>`_ - The C++Now! 2012 talk 323106f32e7eSjoerg describing this technique in more detail. 323206f32e7eSjoerg#. `Sean Parent's Papers and Presentations 323306f32e7eSjoerg <http://github.com/sean-parent/sean-parent.github.com/wiki/Papers-and-Presentations>`_ 3234*da58b97aSjoerg - A GitHub project full of links to slides, video, and sometimes code. 323506f32e7eSjoerg 323606f32e7eSjoergWhen deciding between creating a type hierarchy (with either tagged or virtual 323706f32e7eSjoergdispatch) and using templates or concepts-based polymorphism, consider whether 323806f32e7eSjoergthere is some refinement of an abstract base class which is a semantically 323906f32e7eSjoergmeaningful type on an interface boundary. If anything more refined than the 324006f32e7eSjoergroot abstract interface is meaningless to talk about as a partial extension of 324106f32e7eSjoergthe semantic model, then your use case likely fits better with polymorphism and 324206f32e7eSjoergyou should avoid using virtual dispatch. However, there may be some exigent 324306f32e7eSjoergcircumstances that require one technique or the other to be used. 324406f32e7eSjoerg 324506f32e7eSjoergIf you do need to introduce a type hierarchy, we prefer to use explicitly 324606f32e7eSjoergclosed type hierarchies with manual tagged dispatch and/or RTTI rather than the 324706f32e7eSjoergopen inheritance model and virtual dispatch that is more common in C++ code. 324806f32e7eSjoergThis is because LLVM rarely encourages library consumers to extend its core 324906f32e7eSjoergtypes, and leverages the closed and tag-dispatched nature of its hierarchies to 325006f32e7eSjoerggenerate significantly more efficient code. We have also found that a large 325106f32e7eSjoergamount of our usage of type hierarchies fits better with tag-based pattern 325206f32e7eSjoergmatching rather than dynamic dispatch across a common interface. Within LLVM we 325306f32e7eSjoerghave built custom helpers to facilitate this design. See this document's 325406f32e7eSjoergsection on :ref:`isa and dyn_cast <isa>` and our :doc:`detailed document 325506f32e7eSjoerg<HowToSetUpLLVMStyleRTTI>` which describes how you can implement this 325606f32e7eSjoergpattern for use with the LLVM helpers. 325706f32e7eSjoerg 325806f32e7eSjoerg.. _abi_breaking_checks: 325906f32e7eSjoerg 326006f32e7eSjoergABI Breaking Checks 326106f32e7eSjoerg------------------- 326206f32e7eSjoerg 326306f32e7eSjoergChecks and asserts that alter the LLVM C++ ABI are predicated on the 326406f32e7eSjoergpreprocessor symbol `LLVM_ENABLE_ABI_BREAKING_CHECKS` -- LLVM 326506f32e7eSjoerglibraries built with `LLVM_ENABLE_ABI_BREAKING_CHECKS` are not ABI 326606f32e7eSjoergcompatible LLVM libraries built without it defined. By default, 326706f32e7eSjoergturning on assertions also turns on `LLVM_ENABLE_ABI_BREAKING_CHECKS` 326806f32e7eSjoergso a default +Asserts build is not ABI compatible with a 326906f32e7eSjoergdefault -Asserts build. Clients that want ABI compatibility 327006f32e7eSjoergbetween +Asserts and -Asserts builds should use the CMake build system 327106f32e7eSjoergto set `LLVM_ENABLE_ABI_BREAKING_CHECKS` independently 327206f32e7eSjoergof `LLVM_ENABLE_ASSERTIONS`. 327306f32e7eSjoerg 327406f32e7eSjoerg.. _coreclasses: 327506f32e7eSjoerg 327606f32e7eSjoergThe Core LLVM Class Hierarchy Reference 327706f32e7eSjoerg======================================= 327806f32e7eSjoerg 327906f32e7eSjoerg``#include "llvm/IR/Type.h"`` 328006f32e7eSjoerg 3281*da58b97aSjoergheader source: `Type.h <https://llvm.org/doxygen/Type_8h_source.html>`_ 328206f32e7eSjoerg 3283*da58b97aSjoergdoxygen info: `Type Classes <https://llvm.org/doxygen/classllvm_1_1Type.html>`_ 328406f32e7eSjoerg 328506f32e7eSjoergThe Core LLVM classes are the primary means of representing the program being 328606f32e7eSjoerginspected or transformed. The core LLVM classes are defined in header files in 328706f32e7eSjoergthe ``include/llvm/IR`` directory, and implemented in the ``lib/IR`` 328806f32e7eSjoergdirectory. It's worth noting that, for historical reasons, this library is 328906f32e7eSjoergcalled ``libLLVMCore.so``, not ``libLLVMIR.so`` as you might expect. 329006f32e7eSjoerg 329106f32e7eSjoerg.. _Type: 329206f32e7eSjoerg 329306f32e7eSjoergThe Type class and Derived Types 329406f32e7eSjoerg-------------------------------- 329506f32e7eSjoerg 329606f32e7eSjoerg``Type`` is a superclass of all type classes. Every ``Value`` has a ``Type``. 329706f32e7eSjoerg``Type`` cannot be instantiated directly but only through its subclasses. 329806f32e7eSjoergCertain primitive types (``VoidType``, ``LabelType``, ``FloatType`` and 329906f32e7eSjoerg``DoubleType``) have hidden subclasses. They are hidden because they offer no 330006f32e7eSjoerguseful functionality beyond what the ``Type`` class offers except to distinguish 330106f32e7eSjoergthemselves from other subclasses of ``Type``. 330206f32e7eSjoerg 330306f32e7eSjoergAll other types are subclasses of ``DerivedType``. Types can be named, but this 330406f32e7eSjoergis not a requirement. There exists exactly one instance of a given shape at any 330506f32e7eSjoergone time. This allows type equality to be performed with address equality of 330606f32e7eSjoergthe Type Instance. That is, given two ``Type*`` values, the types are identical 330706f32e7eSjoergif the pointers are identical. 330806f32e7eSjoerg 330906f32e7eSjoerg.. _m_Type: 331006f32e7eSjoerg 331106f32e7eSjoergImportant Public Methods 331206f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^ 331306f32e7eSjoerg 331406f32e7eSjoerg* ``bool isIntegerTy() const``: Returns true for any integer type. 331506f32e7eSjoerg 331606f32e7eSjoerg* ``bool isFloatingPointTy()``: Return true if this is one of the five 331706f32e7eSjoerg floating point types. 331806f32e7eSjoerg 331906f32e7eSjoerg* ``bool isSized()``: Return true if the type has known size. Things 332006f32e7eSjoerg that don't have a size are abstract types, labels and void. 332106f32e7eSjoerg 332206f32e7eSjoerg.. _derivedtypes: 332306f32e7eSjoerg 332406f32e7eSjoergImportant Derived Types 332506f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^ 332606f32e7eSjoerg 332706f32e7eSjoerg``IntegerType`` 332806f32e7eSjoerg Subclass of DerivedType that represents integer types of any bit width. Any 332906f32e7eSjoerg bit width between ``IntegerType::MIN_INT_BITS`` (1) and 333006f32e7eSjoerg ``IntegerType::MAX_INT_BITS`` (~8 million) can be represented. 333106f32e7eSjoerg 333206f32e7eSjoerg * ``static const IntegerType* get(unsigned NumBits)``: get an integer 333306f32e7eSjoerg type of a specific bit width. 333406f32e7eSjoerg 333506f32e7eSjoerg * ``unsigned getBitWidth() const``: Get the bit width of an integer type. 333606f32e7eSjoerg 333706f32e7eSjoerg``SequentialType`` 333806f32e7eSjoerg This is subclassed by ArrayType and VectorType. 333906f32e7eSjoerg 334006f32e7eSjoerg * ``const Type * getElementType() const``: Returns the type of each 334106f32e7eSjoerg of the elements in the sequential type. 334206f32e7eSjoerg 334306f32e7eSjoerg * ``uint64_t getNumElements() const``: Returns the number of elements 334406f32e7eSjoerg in the sequential type. 334506f32e7eSjoerg 334606f32e7eSjoerg``ArrayType`` 334706f32e7eSjoerg This is a subclass of SequentialType and defines the interface for array 334806f32e7eSjoerg types. 334906f32e7eSjoerg 335006f32e7eSjoerg``PointerType`` 335106f32e7eSjoerg Subclass of Type for pointer types. 335206f32e7eSjoerg 335306f32e7eSjoerg``VectorType`` 335406f32e7eSjoerg Subclass of SequentialType for vector types. A vector type is similar to an 335506f32e7eSjoerg ArrayType but is distinguished because it is a first class type whereas 335606f32e7eSjoerg ArrayType is not. Vector types are used for vector operations and are usually 335706f32e7eSjoerg small vectors of an integer or floating point type. 335806f32e7eSjoerg 335906f32e7eSjoerg``StructType`` 336006f32e7eSjoerg Subclass of DerivedTypes for struct types. 336106f32e7eSjoerg 336206f32e7eSjoerg.. _FunctionType: 336306f32e7eSjoerg 336406f32e7eSjoerg``FunctionType`` 336506f32e7eSjoerg Subclass of DerivedTypes for function types. 336606f32e7eSjoerg 336706f32e7eSjoerg * ``bool isVarArg() const``: Returns true if it's a vararg function. 336806f32e7eSjoerg 336906f32e7eSjoerg * ``const Type * getReturnType() const``: Returns the return type of the 337006f32e7eSjoerg function. 337106f32e7eSjoerg 337206f32e7eSjoerg * ``const Type * getParamType (unsigned i)``: Returns the type of the ith 337306f32e7eSjoerg parameter. 337406f32e7eSjoerg 337506f32e7eSjoerg * ``const unsigned getNumParams() const``: Returns the number of formal 337606f32e7eSjoerg parameters. 337706f32e7eSjoerg 337806f32e7eSjoerg.. _Module: 337906f32e7eSjoerg 338006f32e7eSjoergThe ``Module`` class 338106f32e7eSjoerg-------------------- 338206f32e7eSjoerg 338306f32e7eSjoerg``#include "llvm/IR/Module.h"`` 338406f32e7eSjoerg 3385*da58b97aSjoergheader source: `Module.h <https://llvm.org/doxygen/Module_8h_source.html>`_ 338606f32e7eSjoerg 3387*da58b97aSjoergdoxygen info: `Module Class <https://llvm.org/doxygen/classllvm_1_1Module.html>`_ 338806f32e7eSjoerg 338906f32e7eSjoergThe ``Module`` class represents the top level structure present in LLVM 339006f32e7eSjoergprograms. An LLVM module is effectively either a translation unit of the 339106f32e7eSjoergoriginal program or a combination of several translation units merged by the 339206f32e7eSjoerglinker. The ``Module`` class keeps track of a list of :ref:`Function 339306f32e7eSjoerg<c_Function>`\ s, a list of GlobalVariable_\ s, and a SymbolTable_. 339406f32e7eSjoergAdditionally, it contains a few helpful member functions that try to make common 339506f32e7eSjoergoperations easy. 339606f32e7eSjoerg 339706f32e7eSjoerg.. _m_Module: 339806f32e7eSjoerg 339906f32e7eSjoergImportant Public Members of the ``Module`` class 340006f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 340106f32e7eSjoerg 340206f32e7eSjoerg* ``Module::Module(std::string name = "")`` 340306f32e7eSjoerg 340406f32e7eSjoerg Constructing a Module_ is easy. You can optionally provide a name for it 340506f32e7eSjoerg (probably based on the name of the translation unit). 340606f32e7eSjoerg 340706f32e7eSjoerg* | ``Module::iterator`` - Typedef for function list iterator 340806f32e7eSjoerg | ``Module::const_iterator`` - Typedef for const_iterator. 340906f32e7eSjoerg | ``begin()``, ``end()``, ``size()``, ``empty()`` 341006f32e7eSjoerg 341106f32e7eSjoerg These are forwarding methods that make it easy to access the contents of a 341206f32e7eSjoerg ``Module`` object's :ref:`Function <c_Function>` list. 341306f32e7eSjoerg 341406f32e7eSjoerg* ``Module::FunctionListType &getFunctionList()`` 341506f32e7eSjoerg 341606f32e7eSjoerg Returns the list of :ref:`Function <c_Function>`\ s. This is necessary to use 341706f32e7eSjoerg when you need to update the list or perform a complex action that doesn't have 341806f32e7eSjoerg a forwarding method. 341906f32e7eSjoerg 342006f32e7eSjoerg---------------- 342106f32e7eSjoerg 342206f32e7eSjoerg* | ``Module::global_iterator`` - Typedef for global variable list iterator 342306f32e7eSjoerg | ``Module::const_global_iterator`` - Typedef for const_iterator. 342406f32e7eSjoerg | ``global_begin()``, ``global_end()``, ``global_size()``, ``global_empty()`` 342506f32e7eSjoerg 342606f32e7eSjoerg These are forwarding methods that make it easy to access the contents of a 342706f32e7eSjoerg ``Module`` object's GlobalVariable_ list. 342806f32e7eSjoerg 342906f32e7eSjoerg* ``Module::GlobalListType &getGlobalList()`` 343006f32e7eSjoerg 343106f32e7eSjoerg Returns the list of GlobalVariable_\ s. This is necessary to use when you 343206f32e7eSjoerg need to update the list or perform a complex action that doesn't have a 343306f32e7eSjoerg forwarding method. 343406f32e7eSjoerg 343506f32e7eSjoerg---------------- 343606f32e7eSjoerg 343706f32e7eSjoerg* ``SymbolTable *getSymbolTable()`` 343806f32e7eSjoerg 343906f32e7eSjoerg Return a reference to the SymbolTable_ for this ``Module``. 344006f32e7eSjoerg 344106f32e7eSjoerg---------------- 344206f32e7eSjoerg 344306f32e7eSjoerg* ``Function *getFunction(StringRef Name) const`` 344406f32e7eSjoerg 344506f32e7eSjoerg Look up the specified function in the ``Module`` SymbolTable_. If it does not 344606f32e7eSjoerg exist, return ``null``. 344706f32e7eSjoerg 344806f32e7eSjoerg* ``FunctionCallee getOrInsertFunction(const std::string &Name, 344906f32e7eSjoerg const FunctionType *T)`` 345006f32e7eSjoerg 345106f32e7eSjoerg Look up the specified function in the ``Module`` SymbolTable_. If 345206f32e7eSjoerg it does not exist, add an external declaration for the function and 345306f32e7eSjoerg return it. Note that the function signature already present may not 345406f32e7eSjoerg match the requested signature. Thus, in order to enable the common 345506f32e7eSjoerg usage of passing the result directly to EmitCall, the return type is 345606f32e7eSjoerg a struct of ``{FunctionType *T, Constant *FunctionPtr}``, rather 345706f32e7eSjoerg than simply the ``Function*`` with potentially an unexpected 345806f32e7eSjoerg signature. 345906f32e7eSjoerg 346006f32e7eSjoerg* ``std::string getTypeName(const Type *Ty)`` 346106f32e7eSjoerg 346206f32e7eSjoerg If there is at least one entry in the SymbolTable_ for the specified Type_, 346306f32e7eSjoerg return it. Otherwise return the empty string. 346406f32e7eSjoerg 346506f32e7eSjoerg* ``bool addTypeName(const std::string &Name, const Type *Ty)`` 346606f32e7eSjoerg 346706f32e7eSjoerg Insert an entry in the SymbolTable_ mapping ``Name`` to ``Ty``. If there is 346806f32e7eSjoerg already an entry for this name, true is returned and the SymbolTable_ is not 346906f32e7eSjoerg modified. 347006f32e7eSjoerg 347106f32e7eSjoerg.. _Value: 347206f32e7eSjoerg 347306f32e7eSjoergThe ``Value`` class 347406f32e7eSjoerg------------------- 347506f32e7eSjoerg 347606f32e7eSjoerg``#include "llvm/IR/Value.h"`` 347706f32e7eSjoerg 3478*da58b97aSjoergheader source: `Value.h <https://llvm.org/doxygen/Value_8h_source.html>`_ 347906f32e7eSjoerg 3480*da58b97aSjoergdoxygen info: `Value Class <https://llvm.org/doxygen/classllvm_1_1Value.html>`_ 348106f32e7eSjoerg 348206f32e7eSjoergThe ``Value`` class is the most important class in the LLVM Source base. It 348306f32e7eSjoergrepresents a typed value that may be used (among other things) as an operand to 348406f32e7eSjoergan instruction. There are many different types of ``Value``\ s, such as 348506f32e7eSjoergConstant_\ s, Argument_\ s. Even Instruction_\ s and :ref:`Function 348606f32e7eSjoerg<c_Function>`\ s are ``Value``\ s. 348706f32e7eSjoerg 348806f32e7eSjoergA particular ``Value`` may be used many times in the LLVM representation for a 348906f32e7eSjoergprogram. For example, an incoming argument to a function (represented with an 349006f32e7eSjoerginstance of the Argument_ class) is "used" by every instruction in the function 349106f32e7eSjoergthat references the argument. To keep track of this relationship, the ``Value`` 349206f32e7eSjoergclass keeps a list of all of the ``User``\ s that is using it (the User_ class 349306f32e7eSjoergis a base class for all nodes in the LLVM graph that can refer to ``Value``\ s). 349406f32e7eSjoergThis use list is how LLVM represents def-use information in the program, and is 349506f32e7eSjoergaccessible through the ``use_*`` methods, shown below. 349606f32e7eSjoerg 349706f32e7eSjoergBecause LLVM is a typed representation, every LLVM ``Value`` is typed, and this 349806f32e7eSjoergType_ is available through the ``getType()`` method. In addition, all LLVM 349906f32e7eSjoergvalues can be named. The "name" of the ``Value`` is a symbolic string printed 350006f32e7eSjoergin the LLVM code: 350106f32e7eSjoerg 350206f32e7eSjoerg.. code-block:: llvm 350306f32e7eSjoerg 350406f32e7eSjoerg %foo = add i32 1, 2 350506f32e7eSjoerg 350606f32e7eSjoerg.. _nameWarning: 350706f32e7eSjoerg 350806f32e7eSjoergThe name of this instruction is "foo". **NOTE** that the name of any value may 350906f32e7eSjoergbe missing (an empty string), so names should **ONLY** be used for debugging 351006f32e7eSjoerg(making the source code easier to read, debugging printouts), they should not be 351106f32e7eSjoergused to keep track of values or map between them. For this purpose, use a 351206f32e7eSjoerg``std::map`` of pointers to the ``Value`` itself instead. 351306f32e7eSjoerg 351406f32e7eSjoergOne important aspect of LLVM is that there is no distinction between an SSA 351506f32e7eSjoergvariable and the operation that produces it. Because of this, any reference to 351606f32e7eSjoergthe value produced by an instruction (or the value available as an incoming 351706f32e7eSjoergargument, for example) is represented as a direct pointer to the instance of the 351806f32e7eSjoergclass that represents this value. Although this may take some getting used to, 351906f32e7eSjoergit simplifies the representation and makes it easier to manipulate. 352006f32e7eSjoerg 352106f32e7eSjoerg.. _m_Value: 352206f32e7eSjoerg 352306f32e7eSjoergImportant Public Members of the ``Value`` class 352406f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 352506f32e7eSjoerg 352606f32e7eSjoerg* | ``Value::use_iterator`` - Typedef for iterator over the use-list 352706f32e7eSjoerg | ``Value::const_use_iterator`` - Typedef for const_iterator over the 352806f32e7eSjoerg use-list 352906f32e7eSjoerg | ``unsigned use_size()`` - Returns the number of users of the value. 353006f32e7eSjoerg | ``bool use_empty()`` - Returns true if there are no users. 353106f32e7eSjoerg | ``use_iterator use_begin()`` - Get an iterator to the start of the 353206f32e7eSjoerg use-list. 353306f32e7eSjoerg | ``use_iterator use_end()`` - Get an iterator to the end of the use-list. 353406f32e7eSjoerg | ``User *use_back()`` - Returns the last element in the list. 353506f32e7eSjoerg 353606f32e7eSjoerg These methods are the interface to access the def-use information in LLVM. 353706f32e7eSjoerg As with all other iterators in LLVM, the naming conventions follow the 353806f32e7eSjoerg conventions defined by the STL_. 353906f32e7eSjoerg 354006f32e7eSjoerg* ``Type *getType() const`` 354106f32e7eSjoerg This method returns the Type of the Value. 354206f32e7eSjoerg 354306f32e7eSjoerg* | ``bool hasName() const`` 354406f32e7eSjoerg | ``std::string getName() const`` 354506f32e7eSjoerg | ``void setName(const std::string &Name)`` 354606f32e7eSjoerg 354706f32e7eSjoerg This family of methods is used to access and assign a name to a ``Value``, be 354806f32e7eSjoerg aware of the :ref:`precaution above <nameWarning>`. 354906f32e7eSjoerg 355006f32e7eSjoerg* ``void replaceAllUsesWith(Value *V)`` 355106f32e7eSjoerg 355206f32e7eSjoerg This method traverses the use list of a ``Value`` changing all User_\ s of the 355306f32e7eSjoerg current value to refer to "``V``" instead. For example, if you detect that an 355406f32e7eSjoerg instruction always produces a constant value (for example through constant 355506f32e7eSjoerg folding), you can replace all uses of the instruction with the constant like 355606f32e7eSjoerg this: 355706f32e7eSjoerg 355806f32e7eSjoerg .. code-block:: c++ 355906f32e7eSjoerg 356006f32e7eSjoerg Inst->replaceAllUsesWith(ConstVal); 356106f32e7eSjoerg 356206f32e7eSjoerg.. _User: 356306f32e7eSjoerg 356406f32e7eSjoergThe ``User`` class 356506f32e7eSjoerg------------------ 356606f32e7eSjoerg 356706f32e7eSjoerg``#include "llvm/IR/User.h"`` 356806f32e7eSjoerg 3569*da58b97aSjoergheader source: `User.h <https://llvm.org/doxygen/User_8h_source.html>`_ 357006f32e7eSjoerg 3571*da58b97aSjoergdoxygen info: `User Class <https://llvm.org/doxygen/classllvm_1_1User.html>`_ 357206f32e7eSjoerg 357306f32e7eSjoergSuperclass: Value_ 357406f32e7eSjoerg 357506f32e7eSjoergThe ``User`` class is the common base class of all LLVM nodes that may refer to 357606f32e7eSjoerg``Value``\ s. It exposes a list of "Operands" that are all of the ``Value``\ s 357706f32e7eSjoergthat the User is referring to. The ``User`` class itself is a subclass of 357806f32e7eSjoerg``Value``. 357906f32e7eSjoerg 358006f32e7eSjoergThe operands of a ``User`` point directly to the LLVM ``Value`` that it refers 358106f32e7eSjoergto. Because LLVM uses Static Single Assignment (SSA) form, there can only be 358206f32e7eSjoergone definition referred to, allowing this direct connection. This connection 358306f32e7eSjoergprovides the use-def information in LLVM. 358406f32e7eSjoerg 358506f32e7eSjoerg.. _m_User: 358606f32e7eSjoerg 358706f32e7eSjoergImportant Public Members of the ``User`` class 358806f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 358906f32e7eSjoerg 359006f32e7eSjoergThe ``User`` class exposes the operand list in two ways: through an index access 359106f32e7eSjoerginterface and through an iterator based interface. 359206f32e7eSjoerg 359306f32e7eSjoerg* | ``Value *getOperand(unsigned i)`` 359406f32e7eSjoerg | ``unsigned getNumOperands()`` 359506f32e7eSjoerg 359606f32e7eSjoerg These two methods expose the operands of the ``User`` in a convenient form for 359706f32e7eSjoerg direct access. 359806f32e7eSjoerg 359906f32e7eSjoerg* | ``User::op_iterator`` - Typedef for iterator over the operand list 360006f32e7eSjoerg | ``op_iterator op_begin()`` - Get an iterator to the start of the operand 360106f32e7eSjoerg list. 360206f32e7eSjoerg | ``op_iterator op_end()`` - Get an iterator to the end of the operand list. 360306f32e7eSjoerg 360406f32e7eSjoerg Together, these methods make up the iterator based interface to the operands 360506f32e7eSjoerg of a ``User``. 360606f32e7eSjoerg 360706f32e7eSjoerg 360806f32e7eSjoerg.. _Instruction: 360906f32e7eSjoerg 361006f32e7eSjoergThe ``Instruction`` class 361106f32e7eSjoerg------------------------- 361206f32e7eSjoerg 361306f32e7eSjoerg``#include "llvm/IR/Instruction.h"`` 361406f32e7eSjoerg 361506f32e7eSjoergheader source: `Instruction.h 3616*da58b97aSjoerg<https://llvm.org/doxygen/Instruction_8h_source.html>`_ 361706f32e7eSjoerg 361806f32e7eSjoergdoxygen info: `Instruction Class 3619*da58b97aSjoerg<https://llvm.org/doxygen/classllvm_1_1Instruction.html>`_ 362006f32e7eSjoerg 362106f32e7eSjoergSuperclasses: User_, Value_ 362206f32e7eSjoerg 362306f32e7eSjoergThe ``Instruction`` class is the common base class for all LLVM instructions. 362406f32e7eSjoergIt provides only a few methods, but is a very commonly used class. The primary 362506f32e7eSjoergdata tracked by the ``Instruction`` class itself is the opcode (instruction 362606f32e7eSjoergtype) and the parent BasicBlock_ the ``Instruction`` is embedded into. To 362706f32e7eSjoergrepresent a specific type of instruction, one of many subclasses of 362806f32e7eSjoerg``Instruction`` are used. 362906f32e7eSjoerg 363006f32e7eSjoergBecause the ``Instruction`` class subclasses the User_ class, its operands can 363106f32e7eSjoergbe accessed in the same way as for other ``User``\ s (with the 363206f32e7eSjoerg``getOperand()``/``getNumOperands()`` and ``op_begin()``/``op_end()`` methods). 363306f32e7eSjoergAn important file for the ``Instruction`` class is the ``llvm/Instruction.def`` 363406f32e7eSjoergfile. This file contains some meta-data about the various different types of 363506f32e7eSjoerginstructions in LLVM. It describes the enum values that are used as opcodes 363606f32e7eSjoerg(for example ``Instruction::Add`` and ``Instruction::ICmp``), as well as the 363706f32e7eSjoergconcrete sub-classes of ``Instruction`` that implement the instruction (for 363806f32e7eSjoergexample BinaryOperator_ and CmpInst_). Unfortunately, the use of macros in this 363906f32e7eSjoergfile confuses doxygen, so these enum values don't show up correctly in the 3640*da58b97aSjoerg`doxygen output <https://llvm.org/doxygen/classllvm_1_1Instruction.html>`_. 364106f32e7eSjoerg 364206f32e7eSjoerg.. _s_Instruction: 364306f32e7eSjoerg 364406f32e7eSjoergImportant Subclasses of the ``Instruction`` class 364506f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 364606f32e7eSjoerg 364706f32e7eSjoerg.. _BinaryOperator: 364806f32e7eSjoerg 364906f32e7eSjoerg* ``BinaryOperator`` 365006f32e7eSjoerg 365106f32e7eSjoerg This subclasses represents all two operand instructions whose operands must be 365206f32e7eSjoerg the same type, except for the comparison instructions. 365306f32e7eSjoerg 365406f32e7eSjoerg.. _CastInst: 365506f32e7eSjoerg 365606f32e7eSjoerg* ``CastInst`` 365706f32e7eSjoerg This subclass is the parent of the 12 casting instructions. It provides 365806f32e7eSjoerg common operations on cast instructions. 365906f32e7eSjoerg 366006f32e7eSjoerg.. _CmpInst: 366106f32e7eSjoerg 366206f32e7eSjoerg* ``CmpInst`` 366306f32e7eSjoerg 366406f32e7eSjoerg This subclass represents the two comparison instructions, 3665*da58b97aSjoerg `ICmpInst <LangRef.html#i_icmp>`_ (integer operands), and 366606f32e7eSjoerg `FCmpInst <LangRef.html#i_fcmp>`_ (floating point operands). 366706f32e7eSjoerg 366806f32e7eSjoerg.. _m_Instruction: 366906f32e7eSjoerg 367006f32e7eSjoergImportant Public Members of the ``Instruction`` class 367106f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 367206f32e7eSjoerg 367306f32e7eSjoerg* ``BasicBlock *getParent()`` 367406f32e7eSjoerg 367506f32e7eSjoerg Returns the BasicBlock_ that this 367606f32e7eSjoerg ``Instruction`` is embedded into. 367706f32e7eSjoerg 367806f32e7eSjoerg* ``bool mayWriteToMemory()`` 367906f32e7eSjoerg 368006f32e7eSjoerg Returns true if the instruction writes to memory, i.e. it is a ``call``, 368106f32e7eSjoerg ``free``, ``invoke``, or ``store``. 368206f32e7eSjoerg 368306f32e7eSjoerg* ``unsigned getOpcode()`` 368406f32e7eSjoerg 368506f32e7eSjoerg Returns the opcode for the ``Instruction``. 368606f32e7eSjoerg 368706f32e7eSjoerg* ``Instruction *clone() const`` 368806f32e7eSjoerg 368906f32e7eSjoerg Returns another instance of the specified instruction, identical in all ways 369006f32e7eSjoerg to the original except that the instruction has no parent (i.e. it's not 369106f32e7eSjoerg embedded into a BasicBlock_), and it has no name. 369206f32e7eSjoerg 369306f32e7eSjoerg.. _Constant: 369406f32e7eSjoerg 369506f32e7eSjoergThe ``Constant`` class and subclasses 369606f32e7eSjoerg------------------------------------- 369706f32e7eSjoerg 369806f32e7eSjoergConstant represents a base class for different types of constants. It is 369906f32e7eSjoergsubclassed by ConstantInt, ConstantArray, etc. for representing the various 370006f32e7eSjoergtypes of Constants. GlobalValue_ is also a subclass, which represents the 370106f32e7eSjoergaddress of a global variable or function. 370206f32e7eSjoerg 370306f32e7eSjoerg.. _s_Constant: 370406f32e7eSjoerg 370506f32e7eSjoergImportant Subclasses of Constant 370606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 370706f32e7eSjoerg 370806f32e7eSjoerg* ConstantInt : This subclass of Constant represents an integer constant of 370906f32e7eSjoerg any width. 371006f32e7eSjoerg 371106f32e7eSjoerg * ``const APInt& getValue() const``: Returns the underlying 371206f32e7eSjoerg value of this constant, an APInt value. 371306f32e7eSjoerg 371406f32e7eSjoerg * ``int64_t getSExtValue() const``: Converts the underlying APInt value to an 371506f32e7eSjoerg int64_t via sign extension. If the value (not the bit width) of the APInt 371606f32e7eSjoerg is too large to fit in an int64_t, an assertion will result. For this 371706f32e7eSjoerg reason, use of this method is discouraged. 371806f32e7eSjoerg 371906f32e7eSjoerg * ``uint64_t getZExtValue() const``: Converts the underlying APInt value 372006f32e7eSjoerg to a uint64_t via zero extension. IF the value (not the bit width) of the 372106f32e7eSjoerg APInt is too large to fit in a uint64_t, an assertion will result. For this 372206f32e7eSjoerg reason, use of this method is discouraged. 372306f32e7eSjoerg 372406f32e7eSjoerg * ``static ConstantInt* get(const APInt& Val)``: Returns the ConstantInt 372506f32e7eSjoerg object that represents the value provided by ``Val``. The type is implied 372606f32e7eSjoerg as the IntegerType that corresponds to the bit width of ``Val``. 372706f32e7eSjoerg 372806f32e7eSjoerg * ``static ConstantInt* get(const Type *Ty, uint64_t Val)``: Returns the 372906f32e7eSjoerg ConstantInt object that represents the value provided by ``Val`` for integer 373006f32e7eSjoerg type ``Ty``. 373106f32e7eSjoerg 373206f32e7eSjoerg* ConstantFP : This class represents a floating point constant. 373306f32e7eSjoerg 373406f32e7eSjoerg * ``double getValue() const``: Returns the underlying value of this constant. 373506f32e7eSjoerg 373606f32e7eSjoerg* ConstantArray : This represents a constant array. 373706f32e7eSjoerg 373806f32e7eSjoerg * ``const std::vector<Use> &getValues() const``: Returns a vector of 373906f32e7eSjoerg component constants that makeup this array. 374006f32e7eSjoerg 374106f32e7eSjoerg* ConstantStruct : This represents a constant struct. 374206f32e7eSjoerg 374306f32e7eSjoerg * ``const std::vector<Use> &getValues() const``: Returns a vector of 374406f32e7eSjoerg component constants that makeup this array. 374506f32e7eSjoerg 374606f32e7eSjoerg* GlobalValue : This represents either a global variable or a function. In 374706f32e7eSjoerg either case, the value is a constant fixed address (after linking). 374806f32e7eSjoerg 374906f32e7eSjoerg.. _GlobalValue: 375006f32e7eSjoerg 375106f32e7eSjoergThe ``GlobalValue`` class 375206f32e7eSjoerg------------------------- 375306f32e7eSjoerg 375406f32e7eSjoerg``#include "llvm/IR/GlobalValue.h"`` 375506f32e7eSjoerg 375606f32e7eSjoergheader source: `GlobalValue.h 3757*da58b97aSjoerg<https://llvm.org/doxygen/GlobalValue_8h_source.html>`_ 375806f32e7eSjoerg 375906f32e7eSjoergdoxygen info: `GlobalValue Class 3760*da58b97aSjoerg<https://llvm.org/doxygen/classllvm_1_1GlobalValue.html>`_ 376106f32e7eSjoerg 376206f32e7eSjoergSuperclasses: Constant_, User_, Value_ 376306f32e7eSjoerg 376406f32e7eSjoergGlobal values ( GlobalVariable_\ s or :ref:`Function <c_Function>`\ s) are the 376506f32e7eSjoergonly LLVM values that are visible in the bodies of all :ref:`Function 376606f32e7eSjoerg<c_Function>`\ s. Because they are visible at global scope, they are also 376706f32e7eSjoergsubject to linking with other globals defined in different translation units. 376806f32e7eSjoergTo control the linking process, ``GlobalValue``\ s know their linkage rules. 376906f32e7eSjoergSpecifically, ``GlobalValue``\ s know whether they have internal or external 377006f32e7eSjoerglinkage, as defined by the ``LinkageTypes`` enumeration. 377106f32e7eSjoerg 377206f32e7eSjoergIf a ``GlobalValue`` has internal linkage (equivalent to being ``static`` in C), 377306f32e7eSjoergit is not visible to code outside the current translation unit, and does not 377406f32e7eSjoergparticipate in linking. If it has external linkage, it is visible to external 377506f32e7eSjoergcode, and does participate in linking. In addition to linkage information, 377606f32e7eSjoerg``GlobalValue``\ s keep track of which Module_ they are currently part of. 377706f32e7eSjoerg 377806f32e7eSjoergBecause ``GlobalValue``\ s are memory objects, they are always referred to by 377906f32e7eSjoergtheir **address**. As such, the Type_ of a global is always a pointer to its 378006f32e7eSjoergcontents. It is important to remember this when using the ``GetElementPtrInst`` 378106f32e7eSjoerginstruction because this pointer must be dereferenced first. For example, if 378206f32e7eSjoergyou have a ``GlobalVariable`` (a subclass of ``GlobalValue)`` that is an array 378306f32e7eSjoergof 24 ints, type ``[24 x i32]``, then the ``GlobalVariable`` is a pointer to 378406f32e7eSjoergthat array. Although the address of the first element of this array and the 378506f32e7eSjoergvalue of the ``GlobalVariable`` are the same, they have different types. The 378606f32e7eSjoerg``GlobalVariable``'s type is ``[24 x i32]``. The first element's type is 378706f32e7eSjoerg``i32.`` Because of this, accessing a global value requires you to dereference 378806f32e7eSjoergthe pointer with ``GetElementPtrInst`` first, then its elements can be accessed. 378906f32e7eSjoergThis is explained in the `LLVM Language Reference Manual 379006f32e7eSjoerg<LangRef.html#globalvars>`_. 379106f32e7eSjoerg 379206f32e7eSjoerg.. _m_GlobalValue: 379306f32e7eSjoerg 379406f32e7eSjoergImportant Public Members of the ``GlobalValue`` class 379506f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 379606f32e7eSjoerg 379706f32e7eSjoerg* | ``bool hasInternalLinkage() const`` 379806f32e7eSjoerg | ``bool hasExternalLinkage() const`` 379906f32e7eSjoerg | ``void setInternalLinkage(bool HasInternalLinkage)`` 380006f32e7eSjoerg 380106f32e7eSjoerg These methods manipulate the linkage characteristics of the ``GlobalValue``. 380206f32e7eSjoerg 380306f32e7eSjoerg* ``Module *getParent()`` 380406f32e7eSjoerg 380506f32e7eSjoerg This returns the Module_ that the 380606f32e7eSjoerg GlobalValue is currently embedded into. 380706f32e7eSjoerg 380806f32e7eSjoerg.. _c_Function: 380906f32e7eSjoerg 381006f32e7eSjoergThe ``Function`` class 381106f32e7eSjoerg---------------------- 381206f32e7eSjoerg 381306f32e7eSjoerg``#include "llvm/IR/Function.h"`` 381406f32e7eSjoerg 3815*da58b97aSjoergheader source: `Function.h <https://llvm.org/doxygen/Function_8h_source.html>`_ 381606f32e7eSjoerg 381706f32e7eSjoergdoxygen info: `Function Class 3818*da58b97aSjoerg<https://llvm.org/doxygen/classllvm_1_1Function.html>`_ 381906f32e7eSjoerg 382006f32e7eSjoergSuperclasses: GlobalValue_, Constant_, User_, Value_ 382106f32e7eSjoerg 382206f32e7eSjoergThe ``Function`` class represents a single procedure in LLVM. It is actually 382306f32e7eSjoergone of the more complex classes in the LLVM hierarchy because it must keep track 382406f32e7eSjoergof a large amount of data. The ``Function`` class keeps track of a list of 382506f32e7eSjoergBasicBlock_\ s, a list of formal Argument_\ s, and a SymbolTable_. 382606f32e7eSjoerg 382706f32e7eSjoergThe list of BasicBlock_\ s is the most commonly used part of ``Function`` 382806f32e7eSjoergobjects. The list imposes an implicit ordering of the blocks in the function, 382906f32e7eSjoergwhich indicate how the code will be laid out by the backend. Additionally, the 383006f32e7eSjoergfirst BasicBlock_ is the implicit entry node for the ``Function``. It is not 383106f32e7eSjoerglegal in LLVM to explicitly branch to this initial block. There are no implicit 383206f32e7eSjoergexit nodes, and in fact there may be multiple exit nodes from a single 383306f32e7eSjoerg``Function``. If the BasicBlock_ list is empty, this indicates that the 383406f32e7eSjoerg``Function`` is actually a function declaration: the actual body of the function 383506f32e7eSjoerghasn't been linked in yet. 383606f32e7eSjoerg 383706f32e7eSjoergIn addition to a list of BasicBlock_\ s, the ``Function`` class also keeps track 383806f32e7eSjoergof the list of formal Argument_\ s that the function receives. This container 383906f32e7eSjoergmanages the lifetime of the Argument_ nodes, just like the BasicBlock_ list does 384006f32e7eSjoergfor the BasicBlock_\ s. 384106f32e7eSjoerg 384206f32e7eSjoergThe SymbolTable_ is a very rarely used LLVM feature that is only used when you 384306f32e7eSjoerghave to look up a value by name. Aside from that, the SymbolTable_ is used 384406f32e7eSjoerginternally to make sure that there are not conflicts between the names of 384506f32e7eSjoergInstruction_\ s, BasicBlock_\ s, or Argument_\ s in the function body. 384606f32e7eSjoerg 384706f32e7eSjoergNote that ``Function`` is a GlobalValue_ and therefore also a Constant_. The 384806f32e7eSjoergvalue of the function is its address (after linking) which is guaranteed to be 384906f32e7eSjoergconstant. 385006f32e7eSjoerg 385106f32e7eSjoerg.. _m_Function: 385206f32e7eSjoerg 385306f32e7eSjoergImportant Public Members of the ``Function`` 385406f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 385506f32e7eSjoerg 385606f32e7eSjoerg* ``Function(const FunctionType *Ty, LinkageTypes Linkage, 385706f32e7eSjoerg const std::string &N = "", Module* Parent = 0)`` 385806f32e7eSjoerg 385906f32e7eSjoerg Constructor used when you need to create new ``Function``\ s to add the 386006f32e7eSjoerg program. The constructor must specify the type of the function to create and 386106f32e7eSjoerg what type of linkage the function should have. The FunctionType_ argument 386206f32e7eSjoerg specifies the formal arguments and return value for the function. The same 386306f32e7eSjoerg FunctionType_ value can be used to create multiple functions. The ``Parent`` 386406f32e7eSjoerg argument specifies the Module in which the function is defined. If this 386506f32e7eSjoerg argument is provided, the function will automatically be inserted into that 386606f32e7eSjoerg module's list of functions. 386706f32e7eSjoerg 386806f32e7eSjoerg* ``bool isDeclaration()`` 386906f32e7eSjoerg 387006f32e7eSjoerg Return whether or not the ``Function`` has a body defined. If the function is 387106f32e7eSjoerg "external", it does not have a body, and thus must be resolved by linking with 387206f32e7eSjoerg a function defined in a different translation unit. 387306f32e7eSjoerg 387406f32e7eSjoerg* | ``Function::iterator`` - Typedef for basic block list iterator 387506f32e7eSjoerg | ``Function::const_iterator`` - Typedef for const_iterator. 387606f32e7eSjoerg | ``begin()``, ``end()``, ``size()``, ``empty()`` 387706f32e7eSjoerg 387806f32e7eSjoerg These are forwarding methods that make it easy to access the contents of a 387906f32e7eSjoerg ``Function`` object's BasicBlock_ list. 388006f32e7eSjoerg 388106f32e7eSjoerg* ``Function::BasicBlockListType &getBasicBlockList()`` 388206f32e7eSjoerg 388306f32e7eSjoerg Returns the list of BasicBlock_\ s. This is necessary to use when you need to 388406f32e7eSjoerg update the list or perform a complex action that doesn't have a forwarding 388506f32e7eSjoerg method. 388606f32e7eSjoerg 388706f32e7eSjoerg* | ``Function::arg_iterator`` - Typedef for the argument list iterator 388806f32e7eSjoerg | ``Function::const_arg_iterator`` - Typedef for const_iterator. 388906f32e7eSjoerg | ``arg_begin()``, ``arg_end()``, ``arg_size()``, ``arg_empty()`` 389006f32e7eSjoerg 389106f32e7eSjoerg These are forwarding methods that make it easy to access the contents of a 389206f32e7eSjoerg ``Function`` object's Argument_ list. 389306f32e7eSjoerg 389406f32e7eSjoerg* ``Function::ArgumentListType &getArgumentList()`` 389506f32e7eSjoerg 389606f32e7eSjoerg Returns the list of Argument_. This is necessary to use when you need to 389706f32e7eSjoerg update the list or perform a complex action that doesn't have a forwarding 389806f32e7eSjoerg method. 389906f32e7eSjoerg 390006f32e7eSjoerg* ``BasicBlock &getEntryBlock()`` 390106f32e7eSjoerg 390206f32e7eSjoerg Returns the entry ``BasicBlock`` for the function. Because the entry block 390306f32e7eSjoerg for the function is always the first block, this returns the first block of 390406f32e7eSjoerg the ``Function``. 390506f32e7eSjoerg 390606f32e7eSjoerg* | ``Type *getReturnType()`` 390706f32e7eSjoerg | ``FunctionType *getFunctionType()`` 390806f32e7eSjoerg 390906f32e7eSjoerg This traverses the Type_ of the ``Function`` and returns the return type of 391006f32e7eSjoerg the function, or the FunctionType_ of the actual function. 391106f32e7eSjoerg 391206f32e7eSjoerg* ``SymbolTable *getSymbolTable()`` 391306f32e7eSjoerg 391406f32e7eSjoerg Return a pointer to the SymbolTable_ for this ``Function``. 391506f32e7eSjoerg 391606f32e7eSjoerg.. _GlobalVariable: 391706f32e7eSjoerg 391806f32e7eSjoergThe ``GlobalVariable`` class 391906f32e7eSjoerg---------------------------- 392006f32e7eSjoerg 392106f32e7eSjoerg``#include "llvm/IR/GlobalVariable.h"`` 392206f32e7eSjoerg 392306f32e7eSjoergheader source: `GlobalVariable.h 3924*da58b97aSjoerg<https://llvm.org/doxygen/GlobalVariable_8h_source.html>`_ 392506f32e7eSjoerg 392606f32e7eSjoergdoxygen info: `GlobalVariable Class 3927*da58b97aSjoerg<https://llvm.org/doxygen/classllvm_1_1GlobalVariable.html>`_ 392806f32e7eSjoerg 392906f32e7eSjoergSuperclasses: GlobalValue_, Constant_, User_, Value_ 393006f32e7eSjoerg 393106f32e7eSjoergGlobal variables are represented with the (surprise surprise) ``GlobalVariable`` 393206f32e7eSjoergclass. Like functions, ``GlobalVariable``\ s are also subclasses of 393306f32e7eSjoergGlobalValue_, and as such are always referenced by their address (global values 393406f32e7eSjoergmust live in memory, so their "name" refers to their constant address). See 393506f32e7eSjoergGlobalValue_ for more on this. Global variables may have an initial value 393606f32e7eSjoerg(which must be a Constant_), and if they have an initializer, they may be marked 393706f32e7eSjoergas "constant" themselves (indicating that their contents never change at 393806f32e7eSjoergruntime). 393906f32e7eSjoerg 394006f32e7eSjoerg.. _m_GlobalVariable: 394106f32e7eSjoerg 394206f32e7eSjoergImportant Public Members of the ``GlobalVariable`` class 394306f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 394406f32e7eSjoerg 394506f32e7eSjoerg* ``GlobalVariable(const Type *Ty, bool isConstant, LinkageTypes &Linkage, 394606f32e7eSjoerg Constant *Initializer = 0, const std::string &Name = "", Module* Parent = 0)`` 394706f32e7eSjoerg 394806f32e7eSjoerg Create a new global variable of the specified type. If ``isConstant`` is true 394906f32e7eSjoerg then the global variable will be marked as unchanging for the program. The 395006f32e7eSjoerg Linkage parameter specifies the type of linkage (internal, external, weak, 395106f32e7eSjoerg linkonce, appending) for the variable. If the linkage is InternalLinkage, 395206f32e7eSjoerg WeakAnyLinkage, WeakODRLinkage, LinkOnceAnyLinkage or LinkOnceODRLinkage, then 395306f32e7eSjoerg the resultant global variable will have internal linkage. AppendingLinkage 395406f32e7eSjoerg concatenates together all instances (in different translation units) of the 395506f32e7eSjoerg variable into a single variable but is only applicable to arrays. See the 395606f32e7eSjoerg `LLVM Language Reference <LangRef.html#modulestructure>`_ for further details 395706f32e7eSjoerg on linkage types. Optionally an initializer, a name, and the module to put 395806f32e7eSjoerg the variable into may be specified for the global variable as well. 395906f32e7eSjoerg 396006f32e7eSjoerg* ``bool isConstant() const`` 396106f32e7eSjoerg 396206f32e7eSjoerg Returns true if this is a global variable that is known not to be modified at 396306f32e7eSjoerg runtime. 396406f32e7eSjoerg 396506f32e7eSjoerg* ``bool hasInitializer()`` 396606f32e7eSjoerg 3967*da58b97aSjoerg Returns true if this ``GlobalVariable`` has an initializer. 396806f32e7eSjoerg 396906f32e7eSjoerg* ``Constant *getInitializer()`` 397006f32e7eSjoerg 397106f32e7eSjoerg Returns the initial value for a ``GlobalVariable``. It is not legal to call 397206f32e7eSjoerg this method if there is no initializer. 397306f32e7eSjoerg 397406f32e7eSjoerg.. _BasicBlock: 397506f32e7eSjoerg 397606f32e7eSjoergThe ``BasicBlock`` class 397706f32e7eSjoerg------------------------ 397806f32e7eSjoerg 397906f32e7eSjoerg``#include "llvm/IR/BasicBlock.h"`` 398006f32e7eSjoerg 398106f32e7eSjoergheader source: `BasicBlock.h 3982*da58b97aSjoerg<https://llvm.org/doxygen/BasicBlock_8h_source.html>`_ 398306f32e7eSjoerg 398406f32e7eSjoergdoxygen info: `BasicBlock Class 3985*da58b97aSjoerg<https://llvm.org/doxygen/classllvm_1_1BasicBlock.html>`_ 398606f32e7eSjoerg 398706f32e7eSjoergSuperclass: Value_ 398806f32e7eSjoerg 398906f32e7eSjoergThis class represents a single entry single exit section of the code, commonly 399006f32e7eSjoergknown as a basic block by the compiler community. The ``BasicBlock`` class 399106f32e7eSjoergmaintains a list of Instruction_\ s, which form the body of the block. Matching 399206f32e7eSjoergthe language definition, the last element of this list of instructions is always 399306f32e7eSjoerga terminator instruction. 399406f32e7eSjoerg 399506f32e7eSjoergIn addition to tracking the list of instructions that make up the block, the 399606f32e7eSjoerg``BasicBlock`` class also keeps track of the :ref:`Function <c_Function>` that 399706f32e7eSjoergit is embedded into. 399806f32e7eSjoerg 399906f32e7eSjoergNote that ``BasicBlock``\ s themselves are Value_\ s, because they are 400006f32e7eSjoergreferenced by instructions like branches and can go in the switch tables. 400106f32e7eSjoerg``BasicBlock``\ s have type ``label``. 400206f32e7eSjoerg 400306f32e7eSjoerg.. _m_BasicBlock: 400406f32e7eSjoerg 400506f32e7eSjoergImportant Public Members of the ``BasicBlock`` class 400606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 400706f32e7eSjoerg 400806f32e7eSjoerg* ``BasicBlock(const std::string &Name = "", Function *Parent = 0)`` 400906f32e7eSjoerg 401006f32e7eSjoerg The ``BasicBlock`` constructor is used to create new basic blocks for 401106f32e7eSjoerg insertion into a function. The constructor optionally takes a name for the 401206f32e7eSjoerg new block, and a :ref:`Function <c_Function>` to insert it into. If the 401306f32e7eSjoerg ``Parent`` parameter is specified, the new ``BasicBlock`` is automatically 401406f32e7eSjoerg inserted at the end of the specified :ref:`Function <c_Function>`, if not 401506f32e7eSjoerg specified, the BasicBlock must be manually inserted into the :ref:`Function 401606f32e7eSjoerg <c_Function>`. 401706f32e7eSjoerg 401806f32e7eSjoerg* | ``BasicBlock::iterator`` - Typedef for instruction list iterator 401906f32e7eSjoerg | ``BasicBlock::const_iterator`` - Typedef for const_iterator. 402006f32e7eSjoerg | ``begin()``, ``end()``, ``front()``, ``back()``, 402106f32e7eSjoerg ``size()``, ``empty()`` 402206f32e7eSjoerg STL-style functions for accessing the instruction list. 402306f32e7eSjoerg 402406f32e7eSjoerg These methods and typedefs are forwarding functions that have the same 402506f32e7eSjoerg semantics as the standard library methods of the same names. These methods 402606f32e7eSjoerg expose the underlying instruction list of a basic block in a way that is easy 402706f32e7eSjoerg to manipulate. To get the full complement of container operations (including 402806f32e7eSjoerg operations to update the list), you must use the ``getInstList()`` method. 402906f32e7eSjoerg 403006f32e7eSjoerg* ``BasicBlock::InstListType &getInstList()`` 403106f32e7eSjoerg 403206f32e7eSjoerg This method is used to get access to the underlying container that actually 403306f32e7eSjoerg holds the Instructions. This method must be used when there isn't a 403406f32e7eSjoerg forwarding function in the ``BasicBlock`` class for the operation that you 403506f32e7eSjoerg would like to perform. Because there are no forwarding functions for 403606f32e7eSjoerg "updating" operations, you need to use this if you want to update the contents 403706f32e7eSjoerg of a ``BasicBlock``. 403806f32e7eSjoerg 403906f32e7eSjoerg* ``Function *getParent()`` 404006f32e7eSjoerg 404106f32e7eSjoerg Returns a pointer to :ref:`Function <c_Function>` the block is embedded into, 404206f32e7eSjoerg or a null pointer if it is homeless. 404306f32e7eSjoerg 404406f32e7eSjoerg* ``Instruction *getTerminator()`` 404506f32e7eSjoerg 404606f32e7eSjoerg Returns a pointer to the terminator instruction that appears at the end of the 404706f32e7eSjoerg ``BasicBlock``. If there is no terminator instruction, or if the last 404806f32e7eSjoerg instruction in the block is not a terminator, then a null pointer is returned. 404906f32e7eSjoerg 405006f32e7eSjoerg.. _Argument: 405106f32e7eSjoerg 405206f32e7eSjoergThe ``Argument`` class 405306f32e7eSjoerg---------------------- 405406f32e7eSjoerg 405506f32e7eSjoergThis subclass of Value defines the interface for incoming formal arguments to a 405606f32e7eSjoergfunction. A Function maintains a list of its formal arguments. An argument has 405706f32e7eSjoerga pointer to the parent Function. 405806f32e7eSjoerg 405906f32e7eSjoerg 4060