106f32e7eSjoerg========================
206f32e7eSjoergLLVM Programmer's Manual
306f32e7eSjoerg========================
406f32e7eSjoerg
506f32e7eSjoerg.. contents::
606f32e7eSjoerg   :local:
706f32e7eSjoerg
806f32e7eSjoerg.. warning::
906f32e7eSjoerg   This is always a work in progress.
1006f32e7eSjoerg
1106f32e7eSjoerg.. _introduction:
1206f32e7eSjoerg
1306f32e7eSjoergIntroduction
1406f32e7eSjoerg============
1506f32e7eSjoerg
1606f32e7eSjoergThis document is meant to highlight some of the important classes and interfaces
1706f32e7eSjoergavailable in the LLVM source-base.  This manual is not intended to explain what
1806f32e7eSjoergLLVM is, how it works, and what LLVM code looks like.  It assumes that you know
1906f32e7eSjoergthe basics of LLVM and are interested in writing transformations or otherwise
2006f32e7eSjoerganalyzing or manipulating the code.
2106f32e7eSjoerg
2206f32e7eSjoergThis document should get you oriented so that you can find your way in the
2306f32e7eSjoergcontinuously growing source code that makes up the LLVM infrastructure.  Note
2406f32e7eSjoergthat this manual is not intended to serve as a replacement for reading the
2506f32e7eSjoergsource code, so if you think there should be a method in one of these classes to
2606f32e7eSjoergdo something, but it's not listed, check the source.  Links to the `doxygen
27*da58b97aSjoerg<https://llvm.org/doxygen/>`__ sources are provided to make this as easy as
2806f32e7eSjoergpossible.
2906f32e7eSjoerg
3006f32e7eSjoergThe first section of this document describes general information that is useful
3106f32e7eSjoergto know when working in the LLVM infrastructure, and the second describes the
3206f32e7eSjoergCore LLVM classes.  In the future this manual will be extended with information
3306f32e7eSjoergdescribing how to use extension libraries, such as dominator information, CFG
3406f32e7eSjoergtraversal routines, and useful utilities like the ``InstVisitor`` (`doxygen
35*da58b97aSjoerg<https://llvm.org/doxygen/InstVisitor_8h_source.html>`__) template.
3606f32e7eSjoerg
3706f32e7eSjoerg.. _general:
3806f32e7eSjoerg
3906f32e7eSjoergGeneral Information
4006f32e7eSjoerg===================
4106f32e7eSjoerg
4206f32e7eSjoergThis section contains general information that is useful if you are working in
4306f32e7eSjoergthe LLVM source-base, but that isn't specific to any particular API.
4406f32e7eSjoerg
4506f32e7eSjoerg.. _stl:
4606f32e7eSjoerg
4706f32e7eSjoergThe C++ Standard Template Library
4806f32e7eSjoerg---------------------------------
4906f32e7eSjoerg
5006f32e7eSjoergLLVM makes heavy use of the C++ Standard Template Library (STL), perhaps much
5106f32e7eSjoergmore than you are used to, or have seen before.  Because of this, you might want
5206f32e7eSjoergto do a little background reading in the techniques used and capabilities of the
5306f32e7eSjoerglibrary.  There are many good pages that discuss the STL, and several books on
5406f32e7eSjoergthe subject that you can get, so it will not be discussed in this document.
5506f32e7eSjoerg
5606f32e7eSjoergHere are some useful links:
5706f32e7eSjoerg
5806f32e7eSjoerg#. `cppreference.com
5906f32e7eSjoerg   <http://en.cppreference.com/w/>`_ - an excellent
6006f32e7eSjoerg   reference for the STL and other parts of the standard C++ library.
6106f32e7eSjoerg
6206f32e7eSjoerg#. `C++ In a Nutshell <http://www.tempest-sw.com/cpp/>`_ - This is an O'Reilly
6306f32e7eSjoerg   book in the making.  It has a decent Standard Library Reference that rivals
6406f32e7eSjoerg   Dinkumware's, and is unfortunately no longer free since the book has been
6506f32e7eSjoerg   published.
6606f32e7eSjoerg
6706f32e7eSjoerg#. `C++ Frequently Asked Questions <http://www.parashift.com/c++-faq-lite/>`_.
6806f32e7eSjoerg
6906f32e7eSjoerg#. `SGI's STL Programmer's Guide <http://www.sgi.com/tech/stl/>`_ - Contains a
7006f32e7eSjoerg   useful `Introduction to the STL
7106f32e7eSjoerg   <http://www.sgi.com/tech/stl/stl_introduction.html>`_.
7206f32e7eSjoerg
7306f32e7eSjoerg#. `Bjarne Stroustrup's C++ Page
74*da58b97aSjoerg   <http://www.stroustrup.com/C++.html>`_.
7506f32e7eSjoerg
7606f32e7eSjoerg#. `Bruce Eckel's Thinking in C++, 2nd ed. Volume 2 Revision 4.0
7706f32e7eSjoerg   (even better, get the book)
7806f32e7eSjoerg   <http://www.mindview.net/Books/TICPP/ThinkingInCPP2e.html>`_.
7906f32e7eSjoerg
8006f32e7eSjoergYou are also encouraged to take a look at the :doc:`LLVM Coding Standards
8106f32e7eSjoerg<CodingStandards>` guide which focuses on how to write maintainable code more
8206f32e7eSjoergthan where to put your curly braces.
8306f32e7eSjoerg
8406f32e7eSjoerg.. _resources:
8506f32e7eSjoerg
8606f32e7eSjoergOther useful references
8706f32e7eSjoerg-----------------------
8806f32e7eSjoerg
8906f32e7eSjoerg#. `Using static and shared libraries across platforms
9006f32e7eSjoerg   <http://www.fortran-2000.com/ArnaudRecipes/sharedlib.html>`_
9106f32e7eSjoerg
9206f32e7eSjoerg.. _apis:
9306f32e7eSjoerg
9406f32e7eSjoergImportant and useful LLVM APIs
9506f32e7eSjoerg==============================
9606f32e7eSjoerg
9706f32e7eSjoergHere we highlight some LLVM APIs that are generally useful and good to know
9806f32e7eSjoergabout when writing transformations.
9906f32e7eSjoerg
10006f32e7eSjoerg.. _isa:
10106f32e7eSjoerg
10206f32e7eSjoergThe ``isa<>``, ``cast<>`` and ``dyn_cast<>`` templates
10306f32e7eSjoerg------------------------------------------------------
10406f32e7eSjoerg
10506f32e7eSjoergThe LLVM source-base makes extensive use of a custom form of RTTI.  These
10606f32e7eSjoergtemplates have many similarities to the C++ ``dynamic_cast<>`` operator, but
10706f32e7eSjoergthey don't have some drawbacks (primarily stemming from the fact that
10806f32e7eSjoerg``dynamic_cast<>`` only works on classes that have a v-table).  Because they are
10906f32e7eSjoergused so often, you must know what they do and how they work.  All of these
11006f32e7eSjoergtemplates are defined in the ``llvm/Support/Casting.h`` (`doxygen
111*da58b97aSjoerg<https://llvm.org/doxygen/Casting_8h_source.html>`__) file (note that you very
11206f32e7eSjoergrarely have to include this file directly).
11306f32e7eSjoerg
11406f32e7eSjoerg``isa<>``:
11506f32e7eSjoerg  The ``isa<>`` operator works exactly like the Java "``instanceof``" operator.
11606f32e7eSjoerg  It returns true or false depending on whether a reference or pointer points to
11706f32e7eSjoerg  an instance of the specified class.  This can be very useful for constraint
11806f32e7eSjoerg  checking of various sorts (example below).
11906f32e7eSjoerg
12006f32e7eSjoerg``cast<>``:
12106f32e7eSjoerg  The ``cast<>`` operator is a "checked cast" operation.  It converts a pointer
12206f32e7eSjoerg  or reference from a base class to a derived class, causing an assertion
12306f32e7eSjoerg  failure if it is not really an instance of the right type.  This should be
12406f32e7eSjoerg  used in cases where you have some information that makes you believe that
12506f32e7eSjoerg  something is of the right type.  An example of the ``isa<>`` and ``cast<>``
12606f32e7eSjoerg  template is:
12706f32e7eSjoerg
12806f32e7eSjoerg  .. code-block:: c++
12906f32e7eSjoerg
13006f32e7eSjoerg    static bool isLoopInvariant(const Value *V, const Loop *L) {
13106f32e7eSjoerg      if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue>(V))
13206f32e7eSjoerg        return true;
13306f32e7eSjoerg
13406f32e7eSjoerg      // Otherwise, it must be an instruction...
13506f32e7eSjoerg      return !L->contains(cast<Instruction>(V)->getParent());
13606f32e7eSjoerg    }
13706f32e7eSjoerg
13806f32e7eSjoerg  Note that you should **not** use an ``isa<>`` test followed by a ``cast<>``,
13906f32e7eSjoerg  for that use the ``dyn_cast<>`` operator.
14006f32e7eSjoerg
14106f32e7eSjoerg``dyn_cast<>``:
14206f32e7eSjoerg  The ``dyn_cast<>`` operator is a "checking cast" operation.  It checks to see
14306f32e7eSjoerg  if the operand is of the specified type, and if so, returns a pointer to it
14406f32e7eSjoerg  (this operator does not work with references).  If the operand is not of the
14506f32e7eSjoerg  correct type, a null pointer is returned.  Thus, this works very much like
14606f32e7eSjoerg  the ``dynamic_cast<>`` operator in C++, and should be used in the same
14706f32e7eSjoerg  circumstances.  Typically, the ``dyn_cast<>`` operator is used in an ``if``
14806f32e7eSjoerg  statement or some other flow control statement like this:
14906f32e7eSjoerg
15006f32e7eSjoerg  .. code-block:: c++
15106f32e7eSjoerg
15206f32e7eSjoerg    if (auto *AI = dyn_cast<AllocationInst>(Val)) {
15306f32e7eSjoerg      // ...
15406f32e7eSjoerg    }
15506f32e7eSjoerg
15606f32e7eSjoerg  This form of the ``if`` statement effectively combines together a call to
15706f32e7eSjoerg  ``isa<>`` and a call to ``cast<>`` into one statement, which is very
15806f32e7eSjoerg  convenient.
15906f32e7eSjoerg
16006f32e7eSjoerg  Note that the ``dyn_cast<>`` operator, like C++'s ``dynamic_cast<>`` or Java's
16106f32e7eSjoerg  ``instanceof`` operator, can be abused.  In particular, you should not use big
16206f32e7eSjoerg  chained ``if/then/else`` blocks to check for lots of different variants of
16306f32e7eSjoerg  classes.  If you find yourself wanting to do this, it is much cleaner and more
16406f32e7eSjoerg  efficient to use the ``InstVisitor`` class to dispatch over the instruction
16506f32e7eSjoerg  type directly.
16606f32e7eSjoerg
16706f32e7eSjoerg``isa_and_nonnull<>``:
16806f32e7eSjoerg  The ``isa_and_nonnull<>`` operator works just like the ``isa<>`` operator,
16906f32e7eSjoerg  except that it allows for a null pointer as an argument (which it then
17006f32e7eSjoerg  returns false).  This can sometimes be useful, allowing you to combine several
17106f32e7eSjoerg  null checks into one.
17206f32e7eSjoerg
17306f32e7eSjoerg``cast_or_null<>``:
17406f32e7eSjoerg  The ``cast_or_null<>`` operator works just like the ``cast<>`` operator,
17506f32e7eSjoerg  except that it allows for a null pointer as an argument (which it then
17606f32e7eSjoerg  propagates).  This can sometimes be useful, allowing you to combine several
17706f32e7eSjoerg  null checks into one.
17806f32e7eSjoerg
17906f32e7eSjoerg``dyn_cast_or_null<>``:
18006f32e7eSjoerg  The ``dyn_cast_or_null<>`` operator works just like the ``dyn_cast<>``
18106f32e7eSjoerg  operator, except that it allows for a null pointer as an argument (which it
18206f32e7eSjoerg  then propagates).  This can sometimes be useful, allowing you to combine
18306f32e7eSjoerg  several null checks into one.
18406f32e7eSjoerg
18506f32e7eSjoergThese five templates can be used with any classes, whether they have a v-table
18606f32e7eSjoergor not.  If you want to add support for these templates, see the document
18706f32e7eSjoerg:doc:`How to set up LLVM-style RTTI for your class hierarchy
18806f32e7eSjoerg<HowToSetUpLLVMStyleRTTI>`
18906f32e7eSjoerg
19006f32e7eSjoerg.. _string_apis:
19106f32e7eSjoerg
19206f32e7eSjoergPassing strings (the ``StringRef`` and ``Twine`` classes)
19306f32e7eSjoerg---------------------------------------------------------
19406f32e7eSjoerg
19506f32e7eSjoergAlthough LLVM generally does not do much string manipulation, we do have several
19606f32e7eSjoergimportant APIs which take strings.  Two important examples are the Value class
19706f32e7eSjoerg-- which has names for instructions, functions, etc. -- and the ``StringMap``
19806f32e7eSjoergclass which is used extensively in LLVM and Clang.
19906f32e7eSjoerg
20006f32e7eSjoergThese are generic classes, and they need to be able to accept strings which may
20106f32e7eSjoerghave embedded null characters.  Therefore, they cannot simply take a ``const
20206f32e7eSjoergchar *``, and taking a ``const std::string&`` requires clients to perform a heap
20306f32e7eSjoergallocation which is usually unnecessary.  Instead, many LLVM APIs use a
20406f32e7eSjoerg``StringRef`` or a ``const Twine&`` for passing strings efficiently.
20506f32e7eSjoerg
20606f32e7eSjoerg.. _StringRef:
20706f32e7eSjoerg
20806f32e7eSjoergThe ``StringRef`` class
20906f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21006f32e7eSjoerg
21106f32e7eSjoergThe ``StringRef`` data type represents a reference to a constant string (a
21206f32e7eSjoergcharacter array and a length) and supports the common operations available on
21306f32e7eSjoerg``std::string``, but does not require heap allocation.
21406f32e7eSjoerg
21506f32e7eSjoergIt can be implicitly constructed using a C style null-terminated string, an
21606f32e7eSjoerg``std::string``, or explicitly with a character pointer and length.  For
21706f32e7eSjoergexample, the ``StringRef`` find function is declared as:
21806f32e7eSjoerg
21906f32e7eSjoerg.. code-block:: c++
22006f32e7eSjoerg
22106f32e7eSjoerg  iterator find(StringRef Key);
22206f32e7eSjoerg
22306f32e7eSjoergand clients can call it using any one of:
22406f32e7eSjoerg
22506f32e7eSjoerg.. code-block:: c++
22606f32e7eSjoerg
22706f32e7eSjoerg  Map.find("foo");                 // Lookup "foo"
22806f32e7eSjoerg  Map.find(std::string("bar"));    // Lookup "bar"
22906f32e7eSjoerg  Map.find(StringRef("\0baz", 4)); // Lookup "\0baz"
23006f32e7eSjoerg
23106f32e7eSjoergSimilarly, APIs which need to return a string may return a ``StringRef``
23206f32e7eSjoerginstance, which can be used directly or converted to an ``std::string`` using
23306f32e7eSjoergthe ``str`` member function.  See ``llvm/ADT/StringRef.h`` (`doxygen
234*da58b97aSjoerg<https://llvm.org/doxygen/StringRef_8h_source.html>`__) for more
23506f32e7eSjoerginformation.
23606f32e7eSjoerg
23706f32e7eSjoergYou should rarely use the ``StringRef`` class directly, because it contains
23806f32e7eSjoergpointers to external memory it is not generally safe to store an instance of the
23906f32e7eSjoergclass (unless you know that the external storage will not be freed).
24006f32e7eSjoerg``StringRef`` is small and pervasive enough in LLVM that it should always be
24106f32e7eSjoergpassed by value.
24206f32e7eSjoerg
24306f32e7eSjoergThe ``Twine`` class
24406f32e7eSjoerg^^^^^^^^^^^^^^^^^^^
24506f32e7eSjoerg
246*da58b97aSjoergThe ``Twine`` (`doxygen <https://llvm.org/doxygen/classllvm_1_1Twine.html>`__)
24706f32e7eSjoergclass is an efficient way for APIs to accept concatenated strings.  For example,
24806f32e7eSjoerga common LLVM paradigm is to name one instruction based on the name of another
24906f32e7eSjoerginstruction with a suffix, for example:
25006f32e7eSjoerg
25106f32e7eSjoerg.. code-block:: c++
25206f32e7eSjoerg
25306f32e7eSjoerg    New = CmpInst::Create(..., SO->getName() + ".cmp");
25406f32e7eSjoerg
25506f32e7eSjoergThe ``Twine`` class is effectively a lightweight `rope
25606f32e7eSjoerg<http://en.wikipedia.org/wiki/Rope_(computer_science)>`_ which points to
25706f32e7eSjoergtemporary (stack allocated) objects.  Twines can be implicitly constructed as
25806f32e7eSjoergthe result of the plus operator applied to strings (i.e., a C strings, an
25906f32e7eSjoerg``std::string``, or a ``StringRef``).  The twine delays the actual concatenation
26006f32e7eSjoergof strings until it is actually required, at which point it can be efficiently
26106f32e7eSjoergrendered directly into a character array.  This avoids unnecessary heap
26206f32e7eSjoergallocation involved in constructing the temporary results of string
26306f32e7eSjoergconcatenation.  See ``llvm/ADT/Twine.h`` (`doxygen
264*da58b97aSjoerg<https://llvm.org/doxygen/Twine_8h_source.html>`__) and :ref:`here <dss_twine>`
26506f32e7eSjoergfor more information.
26606f32e7eSjoerg
26706f32e7eSjoergAs with a ``StringRef``, ``Twine`` objects point to external memory and should
26806f32e7eSjoergalmost never be stored or mentioned directly.  They are intended solely for use
26906f32e7eSjoergwhen defining a function which should be able to efficiently accept concatenated
27006f32e7eSjoergstrings.
27106f32e7eSjoerg
27206f32e7eSjoerg.. _formatting_strings:
27306f32e7eSjoerg
27406f32e7eSjoergFormatting strings (the ``formatv`` function)
27506f32e7eSjoerg---------------------------------------------
27606f32e7eSjoergWhile LLVM doesn't necessarily do a lot of string manipulation and parsing, it
27706f32e7eSjoergdoes do a lot of string formatting.  From diagnostic messages, to llvm tool
27806f32e7eSjoergoutputs such as ``llvm-readobj`` to printing verbose disassembly listings and
27906f32e7eSjoergLLDB runtime logging, the need for string formatting is pervasive.
28006f32e7eSjoerg
28106f32e7eSjoergThe ``formatv`` is similar in spirit to ``printf``, but uses a different syntax
28206f32e7eSjoergwhich borrows heavily from Python and C#.  Unlike ``printf`` it deduces the type
28306f32e7eSjoergto be formatted at compile time, so it does not need a format specifier such as
28406f32e7eSjoerg``%d``.  This reduces the mental overhead of trying to construct portable format
28506f32e7eSjoergstrings, especially for platform-specific types like ``size_t`` or pointer types.
28606f32e7eSjoergUnlike both ``printf`` and Python, it additionally fails to compile if LLVM does
28706f32e7eSjoergnot know how to format the type.  These two properties ensure that the function
28806f32e7eSjoergis both safer and simpler to use than traditional formatting methods such as
28906f32e7eSjoergthe ``printf`` family of functions.
29006f32e7eSjoerg
29106f32e7eSjoergSimple formatting
29206f32e7eSjoerg^^^^^^^^^^^^^^^^^
29306f32e7eSjoerg
29406f32e7eSjoergA call to ``formatv`` involves a single **format string** consisting of 0 or more
29506f32e7eSjoerg**replacement sequences**, followed by a variable length list of **replacement values**.
29606f32e7eSjoergA replacement sequence is a string of the form ``{N[[,align]:style]}``.
29706f32e7eSjoerg
29806f32e7eSjoerg``N`` refers to the 0-based index of the argument from the list of replacement
29906f32e7eSjoergvalues.  Note that this means it is possible to reference the same parameter
30006f32e7eSjoergmultiple times, possibly with different style and/or alignment options, in any order.
30106f32e7eSjoerg
30206f32e7eSjoerg``align`` is an optional string specifying the width of the field to format
30306f32e7eSjoergthe value into, and the alignment of the value within the field.  It is specified as
30406f32e7eSjoergan optional **alignment style** followed by a positive integral **field width**.  The
30506f32e7eSjoergalignment style can be one of the characters ``-`` (left align), ``=`` (center align),
30606f32e7eSjoergor ``+`` (right align).  The default is right aligned.
30706f32e7eSjoerg
30806f32e7eSjoerg``style`` is an optional string consisting of a type specific that controls the
30906f32e7eSjoergformatting of the value.  For example, to format a floating point value as a percentage,
31006f32e7eSjoergyou can use the style option ``P``.
31106f32e7eSjoerg
31206f32e7eSjoergCustom formatting
31306f32e7eSjoerg^^^^^^^^^^^^^^^^^
31406f32e7eSjoerg
31506f32e7eSjoergThere are two ways to customize the formatting behavior for a type.
31606f32e7eSjoerg
31706f32e7eSjoerg1. Provide a template specialization of ``llvm::format_provider<T>`` for your
31806f32e7eSjoerg   type ``T`` with the appropriate static format method.
31906f32e7eSjoerg
32006f32e7eSjoerg  .. code-block:: c++
32106f32e7eSjoerg
32206f32e7eSjoerg    namespace llvm {
32306f32e7eSjoerg      template<>
32406f32e7eSjoerg      struct format_provider<MyFooBar> {
32506f32e7eSjoerg        static void format(const MyFooBar &V, raw_ostream &Stream, StringRef Style) {
32606f32e7eSjoerg          // Do whatever is necessary to format `V` into `Stream`
32706f32e7eSjoerg        }
32806f32e7eSjoerg      };
32906f32e7eSjoerg      void foo() {
33006f32e7eSjoerg        MyFooBar X;
33106f32e7eSjoerg        std::string S = formatv("{0}", X);
33206f32e7eSjoerg      }
33306f32e7eSjoerg    }
33406f32e7eSjoerg
33506f32e7eSjoerg  This is a useful extensibility mechanism for adding support for formatting your own
33606f32e7eSjoerg  custom types with your own custom Style options.  But it does not help when you want
33706f32e7eSjoerg  to extend the mechanism for formatting a type that the library already knows how to
33806f32e7eSjoerg  format.  For that, we need something else.
33906f32e7eSjoerg
34006f32e7eSjoerg2. Provide a **format adapter** inheriting from ``llvm::FormatAdapter<T>``.
34106f32e7eSjoerg
34206f32e7eSjoerg  .. code-block:: c++
34306f32e7eSjoerg
34406f32e7eSjoerg    namespace anything {
34506f32e7eSjoerg      struct format_int_custom : public llvm::FormatAdapter<int> {
34606f32e7eSjoerg        explicit format_int_custom(int N) : llvm::FormatAdapter<int>(N) {}
34706f32e7eSjoerg        void format(llvm::raw_ostream &Stream, StringRef Style) override {
34806f32e7eSjoerg          // Do whatever is necessary to format ``this->Item`` into ``Stream``
34906f32e7eSjoerg        }
35006f32e7eSjoerg      };
35106f32e7eSjoerg    }
35206f32e7eSjoerg    namespace llvm {
35306f32e7eSjoerg      void foo() {
35406f32e7eSjoerg        std::string S = formatv("{0}", anything::format_int_custom(42));
35506f32e7eSjoerg      }
35606f32e7eSjoerg    }
35706f32e7eSjoerg
35806f32e7eSjoerg  If the type is detected to be derived from ``FormatAdapter<T>``, ``formatv``
35906f32e7eSjoerg  will call the
36006f32e7eSjoerg  ``format`` method on the argument passing in the specified style.  This allows
36106f32e7eSjoerg  one to provide custom formatting of any type, including one which already has
36206f32e7eSjoerg  a builtin format provider.
36306f32e7eSjoerg
36406f32e7eSjoerg``formatv`` Examples
36506f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^
36606f32e7eSjoergBelow is intended to provide an incomplete set of examples demonstrating
36706f32e7eSjoergthe usage of ``formatv``.  More information can be found by reading the
36806f32e7eSjoergdoxygen documentation or by looking at the unit test suite.
36906f32e7eSjoerg
37006f32e7eSjoerg
37106f32e7eSjoerg.. code-block:: c++
37206f32e7eSjoerg
37306f32e7eSjoerg  std::string S;
37406f32e7eSjoerg  // Simple formatting of basic types and implicit string conversion.
37506f32e7eSjoerg  S = formatv("{0} ({1:P})", 7, 0.35);  // S == "7 (35.00%)"
37606f32e7eSjoerg
37706f32e7eSjoerg  // Out-of-order referencing and multi-referencing
37806f32e7eSjoerg  outs() << formatv("{0} {2} {1} {0}", 1, "test", 3); // prints "1 3 test 1"
37906f32e7eSjoerg
38006f32e7eSjoerg  // Left, right, and center alignment
38106f32e7eSjoerg  S = formatv("{0,7}",  'a');  // S == "      a";
38206f32e7eSjoerg  S = formatv("{0,-7}", 'a');  // S == "a      ";
38306f32e7eSjoerg  S = formatv("{0,=7}", 'a');  // S == "   a   ";
38406f32e7eSjoerg  S = formatv("{0,+7}", 'a');  // S == "      a";
38506f32e7eSjoerg
38606f32e7eSjoerg  // Custom styles
38706f32e7eSjoerg  S = formatv("{0:N} - {0:x} - {1:E}", 12345, 123908342); // S == "12,345 - 0x3039 - 1.24E8"
38806f32e7eSjoerg
38906f32e7eSjoerg  // Adapters
39006f32e7eSjoerg  S = formatv("{0}", fmt_align(42, AlignStyle::Center, 7));  // S == "  42   "
39106f32e7eSjoerg  S = formatv("{0}", fmt_repeat("hi", 3)); // S == "hihihi"
39206f32e7eSjoerg  S = formatv("{0}", fmt_pad("hi", 2, 6)); // S == "  hi      "
39306f32e7eSjoerg
39406f32e7eSjoerg  // Ranges
39506f32e7eSjoerg  std::vector<int> V = {8, 9, 10};
39606f32e7eSjoerg  S = formatv("{0}", make_range(V.begin(), V.end())); // S == "8, 9, 10"
39706f32e7eSjoerg  S = formatv("{0:$[+]}", make_range(V.begin(), V.end())); // S == "8+9+10"
39806f32e7eSjoerg  S = formatv("{0:$[ + ]@[x]}", make_range(V.begin(), V.end())); // S == "0x8 + 0x9 + 0xA"
39906f32e7eSjoerg
40006f32e7eSjoerg.. _error_apis:
40106f32e7eSjoerg
40206f32e7eSjoergError handling
40306f32e7eSjoerg--------------
40406f32e7eSjoerg
40506f32e7eSjoergProper error handling helps us identify bugs in our code, and helps end-users
40606f32e7eSjoergunderstand errors in their tool usage. Errors fall into two broad categories:
40706f32e7eSjoerg*programmatic* and *recoverable*, with different strategies for handling and
40806f32e7eSjoergreporting.
40906f32e7eSjoerg
41006f32e7eSjoergProgrammatic Errors
41106f32e7eSjoerg^^^^^^^^^^^^^^^^^^^
41206f32e7eSjoerg
41306f32e7eSjoergProgrammatic errors are violations of program invariants or API contracts, and
41406f32e7eSjoergrepresent bugs within the program itself. Our aim is to document invariants, and
41506f32e7eSjoergto abort quickly at the point of failure (providing some basic diagnostic) when
41606f32e7eSjoerginvariants are broken at runtime.
41706f32e7eSjoerg
41806f32e7eSjoergThe fundamental tools for handling programmatic errors are assertions and the
41906f32e7eSjoergllvm_unreachable function. Assertions are used to express invariant conditions,
42006f32e7eSjoergand should include a message describing the invariant:
42106f32e7eSjoerg
42206f32e7eSjoerg.. code-block:: c++
42306f32e7eSjoerg
42406f32e7eSjoerg  assert(isPhysReg(R) && "All virt regs should have been allocated already.");
42506f32e7eSjoerg
42606f32e7eSjoergThe llvm_unreachable function can be used to document areas of control flow
42706f32e7eSjoergthat should never be entered if the program invariants hold:
42806f32e7eSjoerg
42906f32e7eSjoerg.. code-block:: c++
43006f32e7eSjoerg
43106f32e7eSjoerg  enum { Foo, Bar, Baz } X = foo();
43206f32e7eSjoerg
43306f32e7eSjoerg  switch (X) {
43406f32e7eSjoerg    case Foo: /* Handle Foo */; break;
43506f32e7eSjoerg    case Bar: /* Handle Bar */; break;
43606f32e7eSjoerg    default:
43706f32e7eSjoerg      llvm_unreachable("X should be Foo or Bar here");
43806f32e7eSjoerg  }
43906f32e7eSjoerg
44006f32e7eSjoergRecoverable Errors
44106f32e7eSjoerg^^^^^^^^^^^^^^^^^^
44206f32e7eSjoerg
44306f32e7eSjoergRecoverable errors represent an error in the program's environment, for example
44406f32e7eSjoerga resource failure (a missing file, a dropped network connection, etc.), or
44506f32e7eSjoergmalformed input. These errors should be detected and communicated to a level of
44606f32e7eSjoergthe program where they can be handled appropriately. Handling the error may be
44706f32e7eSjoergas simple as reporting the issue to the user, or it may involve attempts at
44806f32e7eSjoergrecovery.
44906f32e7eSjoerg
45006f32e7eSjoerg.. note::
45106f32e7eSjoerg
45206f32e7eSjoerg   While it would be ideal to use this error handling scheme throughout
45306f32e7eSjoerg   LLVM, there are places where this hasn't been practical to apply. In
45406f32e7eSjoerg   situations where you absolutely must emit a non-programmatic error and
45506f32e7eSjoerg   the ``Error`` model isn't workable you can call ``report_fatal_error``,
456*da58b97aSjoerg   which will call installed error handlers, print a message, and abort the
457*da58b97aSjoerg   program. The use of `report_fatal_error` in this case is discouraged.
45806f32e7eSjoerg
45906f32e7eSjoergRecoverable errors are modeled using LLVM's ``Error`` scheme. This scheme
46006f32e7eSjoergrepresents errors using function return values, similar to classic C integer
46106f32e7eSjoergerror codes, or C++'s ``std::error_code``. However, the ``Error`` class is
46206f32e7eSjoergactually a lightweight wrapper for user-defined error types, allowing arbitrary
46306f32e7eSjoerginformation to be attached to describe the error. This is similar to the way C++
46406f32e7eSjoergexceptions allow throwing of user-defined types.
46506f32e7eSjoerg
46606f32e7eSjoergSuccess values are created by calling ``Error::success()``, E.g.:
46706f32e7eSjoerg
46806f32e7eSjoerg.. code-block:: c++
46906f32e7eSjoerg
47006f32e7eSjoerg  Error foo() {
47106f32e7eSjoerg    // Do something.
47206f32e7eSjoerg    // Return success.
47306f32e7eSjoerg    return Error::success();
47406f32e7eSjoerg  }
47506f32e7eSjoerg
47606f32e7eSjoergSuccess values are very cheap to construct and return - they have minimal
47706f32e7eSjoergimpact on program performance.
47806f32e7eSjoerg
47906f32e7eSjoergFailure values are constructed using ``make_error<T>``, where ``T`` is any class
48006f32e7eSjoergthat inherits from the ErrorInfo utility, E.g.:
48106f32e7eSjoerg
48206f32e7eSjoerg.. code-block:: c++
48306f32e7eSjoerg
48406f32e7eSjoerg  class BadFileFormat : public ErrorInfo<BadFileFormat> {
48506f32e7eSjoerg  public:
48606f32e7eSjoerg    static char ID;
48706f32e7eSjoerg    std::string Path;
48806f32e7eSjoerg
48906f32e7eSjoerg    BadFileFormat(StringRef Path) : Path(Path.str()) {}
49006f32e7eSjoerg
49106f32e7eSjoerg    void log(raw_ostream &OS) const override {
49206f32e7eSjoerg      OS << Path << " is malformed";
49306f32e7eSjoerg    }
49406f32e7eSjoerg
49506f32e7eSjoerg    std::error_code convertToErrorCode() const override {
49606f32e7eSjoerg      return make_error_code(object_error::parse_failed);
49706f32e7eSjoerg    }
49806f32e7eSjoerg  };
49906f32e7eSjoerg
50006f32e7eSjoerg  char BadFileFormat::ID; // This should be declared in the C++ file.
50106f32e7eSjoerg
50206f32e7eSjoerg  Error printFormattedFile(StringRef Path) {
50306f32e7eSjoerg    if (<check for valid format>)
50406f32e7eSjoerg      return make_error<BadFileFormat>(Path);
50506f32e7eSjoerg    // print file contents.
50606f32e7eSjoerg    return Error::success();
50706f32e7eSjoerg  }
50806f32e7eSjoerg
50906f32e7eSjoergError values can be implicitly converted to bool: true for error, false for
51006f32e7eSjoergsuccess, enabling the following idiom:
51106f32e7eSjoerg
51206f32e7eSjoerg.. code-block:: c++
51306f32e7eSjoerg
51406f32e7eSjoerg  Error mayFail();
51506f32e7eSjoerg
51606f32e7eSjoerg  Error foo() {
51706f32e7eSjoerg    if (auto Err = mayFail())
51806f32e7eSjoerg      return Err;
51906f32e7eSjoerg    // Success! We can proceed.
52006f32e7eSjoerg    ...
52106f32e7eSjoerg
52206f32e7eSjoergFor functions that can fail but need to return a value the ``Expected<T>``
52306f32e7eSjoergutility can be used. Values of this type can be constructed with either a
52406f32e7eSjoerg``T``, or an ``Error``. Expected<T> values are also implicitly convertible to
52506f32e7eSjoergboolean, but with the opposite convention to ``Error``: true for success, false
52606f32e7eSjoergfor error. If success, the ``T`` value can be accessed via the dereference
52706f32e7eSjoergoperator. If failure, the ``Error`` value can be extracted using the
52806f32e7eSjoerg``takeError()`` method. Idiomatic usage looks like:
52906f32e7eSjoerg
53006f32e7eSjoerg.. code-block:: c++
53106f32e7eSjoerg
53206f32e7eSjoerg  Expected<FormattedFile> openFormattedFile(StringRef Path) {
53306f32e7eSjoerg    // If badly formatted, return an error.
53406f32e7eSjoerg    if (auto Err = checkFormat(Path))
53506f32e7eSjoerg      return std::move(Err);
53606f32e7eSjoerg    // Otherwise return a FormattedFile instance.
53706f32e7eSjoerg    return FormattedFile(Path);
53806f32e7eSjoerg  }
53906f32e7eSjoerg
54006f32e7eSjoerg  Error processFormattedFile(StringRef Path) {
54106f32e7eSjoerg    // Try to open a formatted file
54206f32e7eSjoerg    if (auto FileOrErr = openFormattedFile(Path)) {
54306f32e7eSjoerg      // On success, grab a reference to the file and continue.
54406f32e7eSjoerg      auto &File = *FileOrErr;
54506f32e7eSjoerg      ...
54606f32e7eSjoerg    } else
54706f32e7eSjoerg      // On error, extract the Error value and return it.
54806f32e7eSjoerg      return FileOrErr.takeError();
54906f32e7eSjoerg  }
55006f32e7eSjoerg
55106f32e7eSjoergIf an ``Expected<T>`` value is in success mode then the ``takeError()`` method
55206f32e7eSjoergwill return a success value. Using this fact, the above function can be
55306f32e7eSjoergrewritten as:
55406f32e7eSjoerg
55506f32e7eSjoerg.. code-block:: c++
55606f32e7eSjoerg
55706f32e7eSjoerg  Error processFormattedFile(StringRef Path) {
55806f32e7eSjoerg    // Try to open a formatted file
55906f32e7eSjoerg    auto FileOrErr = openFormattedFile(Path);
56006f32e7eSjoerg    if (auto Err = FileOrErr.takeError())
56106f32e7eSjoerg      // On error, extract the Error value and return it.
56206f32e7eSjoerg      return Err;
56306f32e7eSjoerg    // On success, grab a reference to the file and continue.
56406f32e7eSjoerg    auto &File = *FileOrErr;
56506f32e7eSjoerg    ...
56606f32e7eSjoerg  }
56706f32e7eSjoerg
56806f32e7eSjoergThis second form is often more readable for functions that involve multiple
56906f32e7eSjoerg``Expected<T>`` values as it limits the indentation required.
57006f32e7eSjoerg
57106f32e7eSjoergAll ``Error`` instances, whether success or failure, must be either checked or
57206f32e7eSjoergmoved from (via ``std::move`` or a return) before they are destructed.
57306f32e7eSjoergAccidentally discarding an unchecked error will cause a program abort at the
57406f32e7eSjoergpoint where the unchecked value's destructor is run, making it easy to identify
57506f32e7eSjoergand fix violations of this rule.
57606f32e7eSjoerg
57706f32e7eSjoergSuccess values are considered checked once they have been tested (by invoking
57806f32e7eSjoergthe boolean conversion operator):
57906f32e7eSjoerg
58006f32e7eSjoerg.. code-block:: c++
58106f32e7eSjoerg
58206f32e7eSjoerg  if (auto Err = mayFail(...))
58306f32e7eSjoerg    return Err; // Failure value - move error to caller.
58406f32e7eSjoerg
58506f32e7eSjoerg  // Safe to continue: Err was checked.
58606f32e7eSjoerg
58706f32e7eSjoergIn contrast, the following code will always cause an abort, even if ``mayFail``
58806f32e7eSjoergreturns a success value:
58906f32e7eSjoerg
59006f32e7eSjoerg.. code-block:: c++
59106f32e7eSjoerg
59206f32e7eSjoerg    mayFail();
59306f32e7eSjoerg    // Program will always abort here, even if mayFail() returns Success, since
59406f32e7eSjoerg    // the value is not checked.
59506f32e7eSjoerg
59606f32e7eSjoergFailure values are considered checked once a handler for the error type has
59706f32e7eSjoergbeen activated:
59806f32e7eSjoerg
59906f32e7eSjoerg.. code-block:: c++
60006f32e7eSjoerg
60106f32e7eSjoerg  handleErrors(
60206f32e7eSjoerg    processFormattedFile(...),
60306f32e7eSjoerg    [](const BadFileFormat &BFF) {
60406f32e7eSjoerg      report("Unable to process " + BFF.Path + ": bad format");
60506f32e7eSjoerg    },
60606f32e7eSjoerg    [](const FileNotFound &FNF) {
60706f32e7eSjoerg      report("File not found " + FNF.Path);
60806f32e7eSjoerg    });
60906f32e7eSjoerg
61006f32e7eSjoergThe ``handleErrors`` function takes an error as its first argument, followed by
61106f32e7eSjoerga variadic list of "handlers", each of which must be a callable type (a
61206f32e7eSjoergfunction, lambda, or class with a call operator) with one argument. The
61306f32e7eSjoerg``handleErrors`` function will visit each handler in the sequence and check its
61406f32e7eSjoergargument type against the dynamic type of the error, running the first handler
61506f32e7eSjoergthat matches. This is the same decision process that is used decide which catch
61606f32e7eSjoergclause to run for a C++ exception.
61706f32e7eSjoerg
61806f32e7eSjoergSince the list of handlers passed to ``handleErrors`` may not cover every error
61906f32e7eSjoergtype that can occur, the ``handleErrors`` function also returns an Error value
62006f32e7eSjoergthat must be checked or propagated. If the error value that is passed to
62106f32e7eSjoerg``handleErrors`` does not match any of the handlers it will be returned from
62206f32e7eSjoerghandleErrors. Idiomatic use of ``handleErrors`` thus looks like:
62306f32e7eSjoerg
62406f32e7eSjoerg.. code-block:: c++
62506f32e7eSjoerg
62606f32e7eSjoerg  if (auto Err =
62706f32e7eSjoerg        handleErrors(
62806f32e7eSjoerg          processFormattedFile(...),
62906f32e7eSjoerg          [](const BadFileFormat &BFF) {
63006f32e7eSjoerg            report("Unable to process " + BFF.Path + ": bad format");
63106f32e7eSjoerg          },
63206f32e7eSjoerg          [](const FileNotFound &FNF) {
63306f32e7eSjoerg            report("File not found " + FNF.Path);
63406f32e7eSjoerg          }))
63506f32e7eSjoerg    return Err;
63606f32e7eSjoerg
63706f32e7eSjoergIn cases where you truly know that the handler list is exhaustive the
63806f32e7eSjoerg``handleAllErrors`` function can be used instead. This is identical to
63906f32e7eSjoerg``handleErrors`` except that it will terminate the program if an unhandled
64006f32e7eSjoergerror is passed in, and can therefore return void. The ``handleAllErrors``
64106f32e7eSjoergfunction should generally be avoided: the introduction of a new error type
64206f32e7eSjoergelsewhere in the program can easily turn a formerly exhaustive list of errors
64306f32e7eSjoerginto a non-exhaustive list, risking unexpected program termination. Where
64406f32e7eSjoergpossible, use handleErrors and propagate unknown errors up the stack instead.
64506f32e7eSjoerg
64606f32e7eSjoergFor tool code, where errors can be handled by printing an error message then
64706f32e7eSjoergexiting with an error code, the :ref:`ExitOnError <err_exitonerr>` utility
64806f32e7eSjoergmay be a better choice than handleErrors, as it simplifies control flow when
64906f32e7eSjoergcalling fallible functions.
65006f32e7eSjoerg
65106f32e7eSjoergIn situations where it is known that a particular call to a fallible function
65206f32e7eSjoergwill always succeed (for example, a call to a function that can only fail on a
65306f32e7eSjoergsubset of inputs with an input that is known to be safe) the
65406f32e7eSjoerg:ref:`cantFail <err_cantfail>` functions can be used to remove the error type,
65506f32e7eSjoergsimplifying control flow.
65606f32e7eSjoerg
65706f32e7eSjoergStringError
65806f32e7eSjoerg"""""""""""
65906f32e7eSjoerg
66006f32e7eSjoergMany kinds of errors have no recovery strategy, the only action that can be
66106f32e7eSjoergtaken is to report them to the user so that the user can attempt to fix the
66206f32e7eSjoergenvironment. In this case representing the error as a string makes perfect
66306f32e7eSjoergsense. LLVM provides the ``StringError`` class for this purpose. It takes two
66406f32e7eSjoergarguments: A string error message, and an equivalent ``std::error_code`` for
66506f32e7eSjoerginteroperability. It also provides a ``createStringError`` function to simplify
66606f32e7eSjoergcommon usage of this class:
66706f32e7eSjoerg
66806f32e7eSjoerg.. code-block:: c++
66906f32e7eSjoerg
67006f32e7eSjoerg  // These two lines of code are equivalent:
67106f32e7eSjoerg  make_error<StringError>("Bad executable", errc::executable_format_error);
67206f32e7eSjoerg  createStringError(errc::executable_format_error, "Bad executable");
67306f32e7eSjoerg
67406f32e7eSjoergIf you're certain that the error you're building will never need to be converted
67506f32e7eSjoergto a ``std::error_code`` you can use the ``inconvertibleErrorCode()`` function:
67606f32e7eSjoerg
67706f32e7eSjoerg.. code-block:: c++
67806f32e7eSjoerg
67906f32e7eSjoerg  createStringError(inconvertibleErrorCode(), "Bad executable");
68006f32e7eSjoerg
68106f32e7eSjoergThis should be done only after careful consideration. If any attempt is made to
68206f32e7eSjoergconvert this error to a ``std::error_code`` it will trigger immediate program
68306f32e7eSjoergtermination. Unless you are certain that your errors will not need
68406f32e7eSjoerginteroperability you should look for an existing ``std::error_code`` that you
68506f32e7eSjoergcan convert to, and even (as painful as it is) consider introducing a new one as
68606f32e7eSjoerga stopgap measure.
68706f32e7eSjoerg
68806f32e7eSjoerg``createStringError`` can take ``printf`` style format specifiers to provide a
68906f32e7eSjoergformatted message:
69006f32e7eSjoerg
69106f32e7eSjoerg.. code-block:: c++
69206f32e7eSjoerg
69306f32e7eSjoerg  createStringError(errc::executable_format_error,
69406f32e7eSjoerg                    "Bad executable: %s", FileName);
69506f32e7eSjoerg
69606f32e7eSjoergInteroperability with std::error_code and ErrorOr
69706f32e7eSjoerg"""""""""""""""""""""""""""""""""""""""""""""""""
69806f32e7eSjoerg
69906f32e7eSjoergMany existing LLVM APIs use ``std::error_code`` and its partner ``ErrorOr<T>``
70006f32e7eSjoerg(which plays the same role as ``Expected<T>``, but wraps a ``std::error_code``
70106f32e7eSjoergrather than an ``Error``). The infectious nature of error types means that an
70206f32e7eSjoergattempt to change one of these functions to return ``Error`` or ``Expected<T>``
70306f32e7eSjoerginstead often results in an avalanche of changes to callers, callers of callers,
70406f32e7eSjoergand so on. (The first such attempt, returning an ``Error`` from
70506f32e7eSjoergMachOObjectFile's constructor, was abandoned after the diff reached 3000 lines,
70606f32e7eSjoergimpacted half a dozen libraries, and was still growing).
70706f32e7eSjoerg
70806f32e7eSjoergTo solve this problem, the ``Error``/``std::error_code`` interoperability requirement was
70906f32e7eSjoergintroduced. Two pairs of functions allow any ``Error`` value to be converted to a
71006f32e7eSjoerg``std::error_code``, any ``Expected<T>`` to be converted to an ``ErrorOr<T>``, and vice
71106f32e7eSjoergversa:
71206f32e7eSjoerg
71306f32e7eSjoerg.. code-block:: c++
71406f32e7eSjoerg
71506f32e7eSjoerg  std::error_code errorToErrorCode(Error Err);
71606f32e7eSjoerg  Error errorCodeToError(std::error_code EC);
71706f32e7eSjoerg
71806f32e7eSjoerg  template <typename T> ErrorOr<T> expectedToErrorOr(Expected<T> TOrErr);
71906f32e7eSjoerg  template <typename T> Expected<T> errorOrToExpected(ErrorOr<T> TOrEC);
72006f32e7eSjoerg
72106f32e7eSjoerg
72206f32e7eSjoergUsing these APIs it is easy to make surgical patches that update individual
72306f32e7eSjoergfunctions from ``std::error_code`` to ``Error``, and from ``ErrorOr<T>`` to
72406f32e7eSjoerg``Expected<T>``.
72506f32e7eSjoerg
72606f32e7eSjoergReturning Errors from error handlers
72706f32e7eSjoerg""""""""""""""""""""""""""""""""""""
72806f32e7eSjoerg
72906f32e7eSjoergError recovery attempts may themselves fail. For that reason, ``handleErrors``
73006f32e7eSjoergactually recognises three different forms of handler signature:
73106f32e7eSjoerg
73206f32e7eSjoerg.. code-block:: c++
73306f32e7eSjoerg
73406f32e7eSjoerg  // Error must be handled, no new errors produced:
73506f32e7eSjoerg  void(UserDefinedError &E);
73606f32e7eSjoerg
73706f32e7eSjoerg  // Error must be handled, new errors can be produced:
73806f32e7eSjoerg  Error(UserDefinedError &E);
73906f32e7eSjoerg
74006f32e7eSjoerg  // Original error can be inspected, then re-wrapped and returned (or a new
74106f32e7eSjoerg  // error can be produced):
74206f32e7eSjoerg  Error(std::unique_ptr<UserDefinedError> E);
74306f32e7eSjoerg
74406f32e7eSjoergAny error returned from a handler will be returned from the ``handleErrors``
74506f32e7eSjoergfunction so that it can be handled itself, or propagated up the stack.
74606f32e7eSjoerg
74706f32e7eSjoerg.. _err_exitonerr:
74806f32e7eSjoerg
74906f32e7eSjoergUsing ExitOnError to simplify tool code
75006f32e7eSjoerg"""""""""""""""""""""""""""""""""""""""
75106f32e7eSjoerg
75206f32e7eSjoergLibrary code should never call ``exit`` for a recoverable error, however in tool
75306f32e7eSjoergcode (especially command line tools) this can be a reasonable approach. Calling
75406f32e7eSjoerg``exit`` upon encountering an error dramatically simplifies control flow as the
75506f32e7eSjoergerror no longer needs to be propagated up the stack. This allows code to be
75606f32e7eSjoergwritten in straight-line style, as long as each fallible call is wrapped in a
75706f32e7eSjoergcheck and call to exit. The ``ExitOnError`` class supports this pattern by
75806f32e7eSjoergproviding call operators that inspect ``Error`` values, stripping the error away
75906f32e7eSjoergin the success case and logging to ``stderr`` then exiting in the failure case.
76006f32e7eSjoerg
76106f32e7eSjoergTo use this class, declare a global ``ExitOnError`` variable in your program:
76206f32e7eSjoerg
76306f32e7eSjoerg.. code-block:: c++
76406f32e7eSjoerg
76506f32e7eSjoerg  ExitOnError ExitOnErr;
76606f32e7eSjoerg
76706f32e7eSjoergCalls to fallible functions can then be wrapped with a call to ``ExitOnErr``,
76806f32e7eSjoergturning them into non-failing calls:
76906f32e7eSjoerg
77006f32e7eSjoerg.. code-block:: c++
77106f32e7eSjoerg
77206f32e7eSjoerg  Error mayFail();
77306f32e7eSjoerg  Expected<int> mayFail2();
77406f32e7eSjoerg
77506f32e7eSjoerg  void foo() {
77606f32e7eSjoerg    ExitOnErr(mayFail());
77706f32e7eSjoerg    int X = ExitOnErr(mayFail2());
77806f32e7eSjoerg  }
77906f32e7eSjoerg
78006f32e7eSjoergOn failure, the error's log message will be written to ``stderr``, optionally
78106f32e7eSjoergpreceded by a string "banner" that can be set by calling the setBanner method. A
78206f32e7eSjoergmapping can also be supplied from ``Error`` values to exit codes using the
78306f32e7eSjoerg``setExitCodeMapper`` method:
78406f32e7eSjoerg
78506f32e7eSjoerg.. code-block:: c++
78606f32e7eSjoerg
78706f32e7eSjoerg  int main(int argc, char *argv[]) {
78806f32e7eSjoerg    ExitOnErr.setBanner(std::string(argv[0]) + " error:");
78906f32e7eSjoerg    ExitOnErr.setExitCodeMapper(
79006f32e7eSjoerg      [](const Error &Err) {
79106f32e7eSjoerg        if (Err.isA<BadFileFormat>())
79206f32e7eSjoerg          return 2;
79306f32e7eSjoerg        return 1;
79406f32e7eSjoerg      });
79506f32e7eSjoerg
79606f32e7eSjoergUse ``ExitOnError`` in your tool code where possible as it can greatly improve
79706f32e7eSjoergreadability.
79806f32e7eSjoerg
79906f32e7eSjoerg.. _err_cantfail:
80006f32e7eSjoerg
80106f32e7eSjoergUsing cantFail to simplify safe callsites
80206f32e7eSjoerg"""""""""""""""""""""""""""""""""""""""""
80306f32e7eSjoerg
80406f32e7eSjoergSome functions may only fail for a subset of their inputs, so calls using known
80506f32e7eSjoergsafe inputs can be assumed to succeed.
80606f32e7eSjoerg
80706f32e7eSjoergThe cantFail functions encapsulate this by wrapping an assertion that their
80806f32e7eSjoergargument is a success value and, in the case of Expected<T>, unwrapping the
80906f32e7eSjoergT value:
81006f32e7eSjoerg
81106f32e7eSjoerg.. code-block:: c++
81206f32e7eSjoerg
81306f32e7eSjoerg  Error onlyFailsForSomeXValues(int X);
81406f32e7eSjoerg  Expected<int> onlyFailsForSomeXValues2(int X);
81506f32e7eSjoerg
81606f32e7eSjoerg  void foo() {
81706f32e7eSjoerg    cantFail(onlyFailsForSomeXValues(KnownSafeValue));
81806f32e7eSjoerg    int Y = cantFail(onlyFailsForSomeXValues2(KnownSafeValue));
81906f32e7eSjoerg    ...
82006f32e7eSjoerg  }
82106f32e7eSjoerg
82206f32e7eSjoergLike the ExitOnError utility, cantFail simplifies control flow. Their treatment
82306f32e7eSjoergof error cases is very different however: Where ExitOnError is guaranteed to
82406f32e7eSjoergterminate the program on an error input, cantFail simply asserts that the result
82506f32e7eSjoergis success. In debug builds this will result in an assertion failure if an error
82606f32e7eSjoergis encountered. In release builds the behavior of cantFail for failure values is
82706f32e7eSjoergundefined. As such, care must be taken in the use of cantFail: clients must be
82806f32e7eSjoergcertain that a cantFail wrapped call really can not fail with the given
82906f32e7eSjoergarguments.
83006f32e7eSjoerg
83106f32e7eSjoergUse of the cantFail functions should be rare in library code, but they are
83206f32e7eSjoerglikely to be of more use in tool and unit-test code where inputs and/or
83306f32e7eSjoergmocked-up classes or functions may be known to be safe.
83406f32e7eSjoerg
83506f32e7eSjoergFallible constructors
83606f32e7eSjoerg"""""""""""""""""""""
83706f32e7eSjoerg
83806f32e7eSjoergSome classes require resource acquisition or other complex initialization that
83906f32e7eSjoergcan fail during construction. Unfortunately constructors can't return errors,
84006f32e7eSjoergand having clients test objects after they're constructed to ensure that they're
84106f32e7eSjoergvalid is error prone as it's all too easy to forget the test. To work around
84206f32e7eSjoergthis, use the named constructor idiom and return an ``Expected<T>``:
84306f32e7eSjoerg
84406f32e7eSjoerg.. code-block:: c++
84506f32e7eSjoerg
84606f32e7eSjoerg  class Foo {
84706f32e7eSjoerg  public:
84806f32e7eSjoerg
84906f32e7eSjoerg    static Expected<Foo> Create(Resource R1, Resource R2) {
850*da58b97aSjoerg      Error Err = Error::success();
85106f32e7eSjoerg      Foo F(R1, R2, Err);
85206f32e7eSjoerg      if (Err)
85306f32e7eSjoerg        return std::move(Err);
85406f32e7eSjoerg      return std::move(F);
85506f32e7eSjoerg    }
85606f32e7eSjoerg
85706f32e7eSjoerg  private:
85806f32e7eSjoerg
85906f32e7eSjoerg    Foo(Resource R1, Resource R2, Error &Err) {
86006f32e7eSjoerg      ErrorAsOutParameter EAO(&Err);
86106f32e7eSjoerg      if (auto Err2 = R1.acquire()) {
86206f32e7eSjoerg        Err = std::move(Err2);
86306f32e7eSjoerg        return;
86406f32e7eSjoerg      }
86506f32e7eSjoerg      Err = R2.acquire();
86606f32e7eSjoerg    }
86706f32e7eSjoerg  };
86806f32e7eSjoerg
86906f32e7eSjoerg
87006f32e7eSjoergHere, the named constructor passes an ``Error`` by reference into the actual
87106f32e7eSjoergconstructor, which the constructor can then use to return errors. The
87206f32e7eSjoerg``ErrorAsOutParameter`` utility sets the ``Error`` value's checked flag on entry
87306f32e7eSjoergto the constructor so that the error can be assigned to, then resets it on exit
87406f32e7eSjoergto force the client (the named constructor) to check the error.
87506f32e7eSjoerg
87606f32e7eSjoergBy using this idiom, clients attempting to construct a Foo receive either a
87706f32e7eSjoergwell-formed Foo or an Error, never an object in an invalid state.
87806f32e7eSjoerg
87906f32e7eSjoergPropagating and consuming errors based on types
88006f32e7eSjoerg"""""""""""""""""""""""""""""""""""""""""""""""
88106f32e7eSjoerg
88206f32e7eSjoergIn some contexts, certain types of error are known to be benign. For example,
88306f32e7eSjoergwhen walking an archive, some clients may be happy to skip over badly formatted
88406f32e7eSjoergobject files rather than terminating the walk immediately. Skipping badly
88506f32e7eSjoergformatted objects could be achieved using an elaborate handler method, but the
88606f32e7eSjoergError.h header provides two utilities that make this idiom much cleaner: the
88706f32e7eSjoergtype inspection method, ``isA``, and the ``consumeError`` function:
88806f32e7eSjoerg
88906f32e7eSjoerg.. code-block:: c++
89006f32e7eSjoerg
89106f32e7eSjoerg  Error walkArchive(Archive A) {
89206f32e7eSjoerg    for (unsigned I = 0; I != A.numMembers(); ++I) {
89306f32e7eSjoerg      auto ChildOrErr = A.getMember(I);
89406f32e7eSjoerg      if (auto Err = ChildOrErr.takeError()) {
89506f32e7eSjoerg        if (Err.isA<BadFileFormat>())
89606f32e7eSjoerg          consumeError(std::move(Err))
89706f32e7eSjoerg        else
89806f32e7eSjoerg          return Err;
89906f32e7eSjoerg      }
90006f32e7eSjoerg      auto &Child = *ChildOrErr;
90106f32e7eSjoerg      // Use Child
90206f32e7eSjoerg      ...
90306f32e7eSjoerg    }
90406f32e7eSjoerg    return Error::success();
90506f32e7eSjoerg  }
90606f32e7eSjoerg
90706f32e7eSjoergConcatenating Errors with joinErrors
90806f32e7eSjoerg""""""""""""""""""""""""""""""""""""
90906f32e7eSjoerg
91006f32e7eSjoergIn the archive walking example above ``BadFileFormat`` errors are simply
91106f32e7eSjoergconsumed and ignored. If the client had wanted report these errors after
91206f32e7eSjoergcompleting the walk over the archive they could use the ``joinErrors`` utility:
91306f32e7eSjoerg
91406f32e7eSjoerg.. code-block:: c++
91506f32e7eSjoerg
91606f32e7eSjoerg  Error walkArchive(Archive A) {
91706f32e7eSjoerg    Error DeferredErrs = Error::success();
91806f32e7eSjoerg    for (unsigned I = 0; I != A.numMembers(); ++I) {
91906f32e7eSjoerg      auto ChildOrErr = A.getMember(I);
92006f32e7eSjoerg      if (auto Err = ChildOrErr.takeError())
92106f32e7eSjoerg        if (Err.isA<BadFileFormat>())
92206f32e7eSjoerg          DeferredErrs = joinErrors(std::move(DeferredErrs), std::move(Err));
92306f32e7eSjoerg        else
92406f32e7eSjoerg          return Err;
92506f32e7eSjoerg      auto &Child = *ChildOrErr;
92606f32e7eSjoerg      // Use Child
92706f32e7eSjoerg      ...
92806f32e7eSjoerg    }
92906f32e7eSjoerg    return DeferredErrs;
93006f32e7eSjoerg  }
93106f32e7eSjoerg
93206f32e7eSjoergThe ``joinErrors`` routine builds a special error type called ``ErrorList``,
93306f32e7eSjoergwhich holds a list of user defined errors. The ``handleErrors`` routine
93406f32e7eSjoergrecognizes this type and will attempt to handle each of the contained errors in
93506f32e7eSjoergorder. If all contained errors can be handled, ``handleErrors`` will return
93606f32e7eSjoerg``Error::success()``, otherwise ``handleErrors`` will concatenate the remaining
93706f32e7eSjoergerrors and return the resulting ``ErrorList``.
93806f32e7eSjoerg
93906f32e7eSjoergBuilding fallible iterators and iterator ranges
94006f32e7eSjoerg"""""""""""""""""""""""""""""""""""""""""""""""
94106f32e7eSjoerg
94206f32e7eSjoergThe archive walking examples above retrieve archive members by index, however
94306f32e7eSjoergthis requires considerable boiler-plate for iteration and error checking. We can
94406f32e7eSjoergclean this up by using the "fallible iterator" pattern, which supports the
94506f32e7eSjoergfollowing natural iteration idiom for fallible containers like Archive:
94606f32e7eSjoerg
94706f32e7eSjoerg.. code-block:: c++
94806f32e7eSjoerg
949*da58b97aSjoerg  Error Err = Error::success();
95006f32e7eSjoerg  for (auto &Child : Ar->children(Err)) {
95106f32e7eSjoerg    // Use Child - only enter the loop when it's valid
95206f32e7eSjoerg
95306f32e7eSjoerg    // Allow early exit from the loop body, since we know that Err is success
95406f32e7eSjoerg    // when we're inside the loop.
95506f32e7eSjoerg    if (BailOutOn(Child))
95606f32e7eSjoerg      return;
95706f32e7eSjoerg
95806f32e7eSjoerg    ...
95906f32e7eSjoerg  }
96006f32e7eSjoerg  // Check Err after the loop to ensure it didn't break due to an error.
96106f32e7eSjoerg  if (Err)
96206f32e7eSjoerg    return Err;
96306f32e7eSjoerg
96406f32e7eSjoergTo enable this idiom, iterators over fallible containers are written in a
96506f32e7eSjoergnatural style, with their ``++`` and ``--`` operators replaced with fallible
96606f32e7eSjoerg``Error inc()`` and ``Error dec()`` functions. E.g.:
96706f32e7eSjoerg
96806f32e7eSjoerg.. code-block:: c++
96906f32e7eSjoerg
97006f32e7eSjoerg  class FallibleChildIterator {
97106f32e7eSjoerg  public:
97206f32e7eSjoerg    FallibleChildIterator(Archive &A, unsigned ChildIdx);
97306f32e7eSjoerg    Archive::Child &operator*();
97406f32e7eSjoerg    friend bool operator==(const ArchiveIterator &LHS,
97506f32e7eSjoerg                           const ArchiveIterator &RHS);
97606f32e7eSjoerg
97706f32e7eSjoerg    // operator++/operator-- replaced with fallible increment / decrement:
97806f32e7eSjoerg    Error inc() {
97906f32e7eSjoerg      if (!A.childValid(ChildIdx + 1))
98006f32e7eSjoerg        return make_error<BadArchiveMember>(...);
98106f32e7eSjoerg      ++ChildIdx;
98206f32e7eSjoerg      return Error::success();
98306f32e7eSjoerg    }
98406f32e7eSjoerg
98506f32e7eSjoerg    Error dec() { ... }
98606f32e7eSjoerg  };
98706f32e7eSjoerg
98806f32e7eSjoergInstances of this kind of fallible iterator interface are then wrapped with the
98906f32e7eSjoergfallible_iterator utility which provides ``operator++`` and ``operator--``,
99006f32e7eSjoergreturning any errors via a reference passed in to the wrapper at construction
99106f32e7eSjoergtime. The fallible_iterator wrapper takes care of (a) jumping to the end of the
99206f32e7eSjoergrange on error, and (b) marking the error as checked whenever an iterator is
99306f32e7eSjoergcompared to ``end`` and found to be inequal (in particular: this marks the
99406f32e7eSjoergerror as checked throughout the body of a range-based for loop), enabling early
99506f32e7eSjoergexit from the loop without redundant error checking.
99606f32e7eSjoerg
99706f32e7eSjoergInstances of the fallible iterator interface (e.g. FallibleChildIterator above)
99806f32e7eSjoergare wrapped using the ``make_fallible_itr`` and ``make_fallible_end``
99906f32e7eSjoergfunctions. E.g.:
100006f32e7eSjoerg
100106f32e7eSjoerg.. code-block:: c++
100206f32e7eSjoerg
100306f32e7eSjoerg  class Archive {
100406f32e7eSjoerg  public:
100506f32e7eSjoerg    using child_iterator = fallible_iterator<FallibleChildIterator>;
100606f32e7eSjoerg
100706f32e7eSjoerg    child_iterator child_begin(Error &Err) {
100806f32e7eSjoerg      return make_fallible_itr(FallibleChildIterator(*this, 0), Err);
100906f32e7eSjoerg    }
101006f32e7eSjoerg
101106f32e7eSjoerg    child_iterator child_end() {
101206f32e7eSjoerg      return make_fallible_end(FallibleChildIterator(*this, size()));
101306f32e7eSjoerg    }
101406f32e7eSjoerg
101506f32e7eSjoerg    iterator_range<child_iterator> children(Error &Err) {
101606f32e7eSjoerg      return make_range(child_begin(Err), child_end());
101706f32e7eSjoerg    }
101806f32e7eSjoerg  };
101906f32e7eSjoerg
102006f32e7eSjoergUsing the fallible_iterator utility allows for both natural construction of
102106f32e7eSjoergfallible iterators (using failing ``inc`` and ``dec`` operations) and
102206f32e7eSjoergrelatively natural use of c++ iterator/loop idioms.
102306f32e7eSjoerg
102406f32e7eSjoerg.. _function_apis:
102506f32e7eSjoerg
102606f32e7eSjoergMore information on Error and its related utilities can be found in the
102706f32e7eSjoergError.h header file.
102806f32e7eSjoerg
102906f32e7eSjoergPassing functions and other callable objects
103006f32e7eSjoerg--------------------------------------------
103106f32e7eSjoerg
103206f32e7eSjoergSometimes you may want a function to be passed a callback object. In order to
103306f32e7eSjoergsupport lambda expressions and other function objects, you should not use the
103406f32e7eSjoergtraditional C approach of taking a function pointer and an opaque cookie:
103506f32e7eSjoerg
103606f32e7eSjoerg.. code-block:: c++
103706f32e7eSjoerg
103806f32e7eSjoerg    void takeCallback(bool (*Callback)(Function *, void *), void *Cookie);
103906f32e7eSjoerg
104006f32e7eSjoergInstead, use one of the following approaches:
104106f32e7eSjoerg
104206f32e7eSjoergFunction template
104306f32e7eSjoerg^^^^^^^^^^^^^^^^^
104406f32e7eSjoerg
104506f32e7eSjoergIf you don't mind putting the definition of your function into a header file,
104606f32e7eSjoergmake it a function template that is templated on the callable type.
104706f32e7eSjoerg
104806f32e7eSjoerg.. code-block:: c++
104906f32e7eSjoerg
105006f32e7eSjoerg    template<typename Callable>
105106f32e7eSjoerg    void takeCallback(Callable Callback) {
105206f32e7eSjoerg      Callback(1, 2, 3);
105306f32e7eSjoerg    }
105406f32e7eSjoerg
105506f32e7eSjoergThe ``function_ref`` class template
105606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
105706f32e7eSjoerg
105806f32e7eSjoergThe ``function_ref``
1059*da58b97aSjoerg(`doxygen <https://llvm.org/doxygen/classllvm_1_1function__ref_3_01Ret_07Params_8_8_8_08_4.html>`__) class
106006f32e7eSjoergtemplate represents a reference to a callable object, templated over the type
106106f32e7eSjoergof the callable. This is a good choice for passing a callback to a function,
106206f32e7eSjoergif you don't need to hold onto the callback after the function returns. In this
106306f32e7eSjoergway, ``function_ref`` is to ``std::function`` as ``StringRef`` is to
106406f32e7eSjoerg``std::string``.
106506f32e7eSjoerg
106606f32e7eSjoerg``function_ref<Ret(Param1, Param2, ...)>`` can be implicitly constructed from
106706f32e7eSjoergany callable object that can be called with arguments of type ``Param1``,
106806f32e7eSjoerg``Param2``, ..., and returns a value that can be converted to type ``Ret``.
106906f32e7eSjoergFor example:
107006f32e7eSjoerg
107106f32e7eSjoerg.. code-block:: c++
107206f32e7eSjoerg
107306f32e7eSjoerg    void visitBasicBlocks(Function *F, function_ref<bool (BasicBlock*)> Callback) {
107406f32e7eSjoerg      for (BasicBlock &BB : *F)
107506f32e7eSjoerg        if (Callback(&BB))
107606f32e7eSjoerg          return;
107706f32e7eSjoerg    }
107806f32e7eSjoerg
107906f32e7eSjoergcan be called using:
108006f32e7eSjoerg
108106f32e7eSjoerg.. code-block:: c++
108206f32e7eSjoerg
108306f32e7eSjoerg    visitBasicBlocks(F, [&](BasicBlock *BB) {
108406f32e7eSjoerg      if (process(BB))
108506f32e7eSjoerg        return isEmpty(BB);
108606f32e7eSjoerg      return false;
108706f32e7eSjoerg    });
108806f32e7eSjoerg
108906f32e7eSjoergNote that a ``function_ref`` object contains pointers to external memory, so it
109006f32e7eSjoergis not generally safe to store an instance of the class (unless you know that
109106f32e7eSjoergthe external storage will not be freed). If you need this ability, consider
109206f32e7eSjoergusing ``std::function``. ``function_ref`` is small enough that it should always
109306f32e7eSjoergbe passed by value.
109406f32e7eSjoerg
109506f32e7eSjoerg.. _DEBUG:
109606f32e7eSjoerg
109706f32e7eSjoergThe ``LLVM_DEBUG()`` macro and ``-debug`` option
109806f32e7eSjoerg------------------------------------------------
109906f32e7eSjoerg
110006f32e7eSjoergOften when working on your pass you will put a bunch of debugging printouts and
110106f32e7eSjoergother code into your pass.  After you get it working, you want to remove it, but
110206f32e7eSjoergyou may need it again in the future (to work out new bugs that you run across).
110306f32e7eSjoerg
110406f32e7eSjoergNaturally, because of this, you don't want to delete the debug printouts, but
110506f32e7eSjoergyou don't want them to always be noisy.  A standard compromise is to comment
110606f32e7eSjoergthem out, allowing you to enable them if you need them in the future.
110706f32e7eSjoerg
110806f32e7eSjoergThe ``llvm/Support/Debug.h`` (`doxygen
1109*da58b97aSjoerg<https://llvm.org/doxygen/Debug_8h_source.html>`__) file provides a macro named
111006f32e7eSjoerg``LLVM_DEBUG()`` that is a much nicer solution to this problem.  Basically, you can
111106f32e7eSjoergput arbitrary code into the argument of the ``LLVM_DEBUG`` macro, and it is only
111206f32e7eSjoergexecuted if '``opt``' (or any other tool) is run with the '``-debug``' command
111306f32e7eSjoergline argument:
111406f32e7eSjoerg
111506f32e7eSjoerg.. code-block:: c++
111606f32e7eSjoerg
111706f32e7eSjoerg  LLVM_DEBUG(dbgs() << "I am here!\n");
111806f32e7eSjoerg
111906f32e7eSjoergThen you can run your pass like this:
112006f32e7eSjoerg
112106f32e7eSjoerg.. code-block:: none
112206f32e7eSjoerg
112306f32e7eSjoerg  $ opt < a.bc > /dev/null -mypass
112406f32e7eSjoerg  <no output>
112506f32e7eSjoerg  $ opt < a.bc > /dev/null -mypass -debug
112606f32e7eSjoerg  I am here!
112706f32e7eSjoerg
112806f32e7eSjoergUsing the ``LLVM_DEBUG()`` macro instead of a home-brewed solution allows you to not
112906f32e7eSjoerghave to create "yet another" command line option for the debug output for your
113006f32e7eSjoergpass.  Note that ``LLVM_DEBUG()`` macros are disabled for non-asserts builds, so they
113106f32e7eSjoergdo not cause a performance impact at all (for the same reason, they should also
113206f32e7eSjoergnot contain side-effects!).
113306f32e7eSjoerg
113406f32e7eSjoergOne additional nice thing about the ``LLVM_DEBUG()`` macro is that you can enable or
113506f32e7eSjoergdisable it directly in gdb.  Just use "``set DebugFlag=0``" or "``set
113606f32e7eSjoergDebugFlag=1``" from the gdb if the program is running.  If the program hasn't
113706f32e7eSjoergbeen started yet, you can always just run it with ``-debug``.
113806f32e7eSjoerg
113906f32e7eSjoerg.. _DEBUG_TYPE:
114006f32e7eSjoerg
114106f32e7eSjoergFine grained debug info with ``DEBUG_TYPE`` and the ``-debug-only`` option
114206f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
114306f32e7eSjoerg
114406f32e7eSjoergSometimes you may find yourself in a situation where enabling ``-debug`` just
114506f32e7eSjoergturns on **too much** information (such as when working on the code generator).
114606f32e7eSjoergIf you want to enable debug information with more fine-grained control, you
114706f32e7eSjoergshould define the ``DEBUG_TYPE`` macro and use the ``-debug-only`` option as
114806f32e7eSjoergfollows:
114906f32e7eSjoerg
115006f32e7eSjoerg.. code-block:: c++
115106f32e7eSjoerg
115206f32e7eSjoerg  #define DEBUG_TYPE "foo"
115306f32e7eSjoerg  LLVM_DEBUG(dbgs() << "'foo' debug type\n");
115406f32e7eSjoerg  #undef  DEBUG_TYPE
115506f32e7eSjoerg  #define DEBUG_TYPE "bar"
115606f32e7eSjoerg  LLVM_DEBUG(dbgs() << "'bar' debug type\n");
115706f32e7eSjoerg  #undef  DEBUG_TYPE
115806f32e7eSjoerg
115906f32e7eSjoergThen you can run your pass like this:
116006f32e7eSjoerg
116106f32e7eSjoerg.. code-block:: none
116206f32e7eSjoerg
116306f32e7eSjoerg  $ opt < a.bc > /dev/null -mypass
116406f32e7eSjoerg  <no output>
116506f32e7eSjoerg  $ opt < a.bc > /dev/null -mypass -debug
116606f32e7eSjoerg  'foo' debug type
116706f32e7eSjoerg  'bar' debug type
116806f32e7eSjoerg  $ opt < a.bc > /dev/null -mypass -debug-only=foo
116906f32e7eSjoerg  'foo' debug type
117006f32e7eSjoerg  $ opt < a.bc > /dev/null -mypass -debug-only=bar
117106f32e7eSjoerg  'bar' debug type
117206f32e7eSjoerg  $ opt < a.bc > /dev/null -mypass -debug-only=foo,bar
117306f32e7eSjoerg  'foo' debug type
117406f32e7eSjoerg  'bar' debug type
117506f32e7eSjoerg
117606f32e7eSjoergOf course, in practice, you should only set ``DEBUG_TYPE`` at the top of a file,
117706f32e7eSjoergto specify the debug type for the entire module. Be careful that you only do
117806f32e7eSjoergthis after including Debug.h and not around any #include of headers. Also, you
117906f32e7eSjoergshould use names more meaningful than "foo" and "bar", because there is no
118006f32e7eSjoergsystem in place to ensure that names do not conflict. If two different modules
118106f32e7eSjoerguse the same string, they will all be turned on when the name is specified.
118206f32e7eSjoergThis allows, for example, all debug information for instruction scheduling to be
118306f32e7eSjoergenabled with ``-debug-only=InstrSched``, even if the source lives in multiple
118406f32e7eSjoergfiles. The name must not include a comma (,) as that is used to separate the
118506f32e7eSjoergarguments of the ``-debug-only`` option.
118606f32e7eSjoerg
118706f32e7eSjoergFor performance reasons, -debug-only is not available in optimized build
118806f32e7eSjoerg(``--enable-optimized``) of LLVM.
118906f32e7eSjoerg
119006f32e7eSjoergThe ``DEBUG_WITH_TYPE`` macro is also available for situations where you would
119106f32e7eSjoerglike to set ``DEBUG_TYPE``, but only for one specific ``DEBUG`` statement.  It
119206f32e7eSjoergtakes an additional first parameter, which is the type to use.  For example, the
119306f32e7eSjoergpreceding example could be written as:
119406f32e7eSjoerg
119506f32e7eSjoerg.. code-block:: c++
119606f32e7eSjoerg
119706f32e7eSjoerg  DEBUG_WITH_TYPE("foo", dbgs() << "'foo' debug type\n");
119806f32e7eSjoerg  DEBUG_WITH_TYPE("bar", dbgs() << "'bar' debug type\n");
119906f32e7eSjoerg
120006f32e7eSjoerg.. _Statistic:
120106f32e7eSjoerg
120206f32e7eSjoergThe ``Statistic`` class & ``-stats`` option
120306f32e7eSjoerg-------------------------------------------
120406f32e7eSjoerg
120506f32e7eSjoergThe ``llvm/ADT/Statistic.h`` (`doxygen
1206*da58b97aSjoerg<https://llvm.org/doxygen/Statistic_8h_source.html>`__) file provides a class
120706f32e7eSjoergnamed ``Statistic`` that is used as a unified way to keep track of what the LLVM
120806f32e7eSjoergcompiler is doing and how effective various optimizations are.  It is useful to
120906f32e7eSjoergsee what optimizations are contributing to making a particular program run
121006f32e7eSjoergfaster.
121106f32e7eSjoerg
121206f32e7eSjoergOften you may run your pass on some big program, and you're interested to see
121306f32e7eSjoerghow many times it makes a certain transformation.  Although you can do this with
121406f32e7eSjoerghand inspection, or some ad-hoc method, this is a real pain and not very useful
121506f32e7eSjoergfor big programs.  Using the ``Statistic`` class makes it very easy to keep
121606f32e7eSjoergtrack of this information, and the calculated information is presented in a
121706f32e7eSjoerguniform manner with the rest of the passes being executed.
121806f32e7eSjoerg
121906f32e7eSjoergThere are many examples of ``Statistic`` uses, but the basics of using it are as
122006f32e7eSjoergfollows:
122106f32e7eSjoerg
122206f32e7eSjoergDefine your statistic like this:
122306f32e7eSjoerg
122406f32e7eSjoerg.. code-block:: c++
122506f32e7eSjoerg
122606f32e7eSjoerg  #define DEBUG_TYPE "mypassname"   // This goes before any #includes.
122706f32e7eSjoerg  STATISTIC(NumXForms, "The # of times I did stuff");
122806f32e7eSjoerg
122906f32e7eSjoergThe ``STATISTIC`` macro defines a static variable, whose name is specified by
123006f32e7eSjoergthe first argument.  The pass name is taken from the ``DEBUG_TYPE`` macro, and
123106f32e7eSjoergthe description is taken from the second argument.  The variable defined
123206f32e7eSjoerg("NumXForms" in this case) acts like an unsigned integer.
123306f32e7eSjoerg
123406f32e7eSjoergWhenever you make a transformation, bump the counter:
123506f32e7eSjoerg
123606f32e7eSjoerg.. code-block:: c++
123706f32e7eSjoerg
123806f32e7eSjoerg  ++NumXForms;   // I did stuff!
123906f32e7eSjoerg
124006f32e7eSjoergThat's all you have to do.  To get '``opt``' to print out the statistics
124106f32e7eSjoerggathered, use the '``-stats``' option:
124206f32e7eSjoerg
124306f32e7eSjoerg.. code-block:: none
124406f32e7eSjoerg
124506f32e7eSjoerg  $ opt -stats -mypassname < program.bc > /dev/null
124606f32e7eSjoerg  ... statistics output ...
124706f32e7eSjoerg
124806f32e7eSjoergNote that in order to use the '``-stats``' option, LLVM must be
124906f32e7eSjoergcompiled with assertions enabled.
125006f32e7eSjoerg
125106f32e7eSjoergWhen running ``opt`` on a C file from the SPEC benchmark suite, it gives a
125206f32e7eSjoergreport that looks like this:
125306f32e7eSjoerg
125406f32e7eSjoerg.. code-block:: none
125506f32e7eSjoerg
125606f32e7eSjoerg   7646 bitcodewriter   - Number of normal instructions
125706f32e7eSjoerg    725 bitcodewriter   - Number of oversized instructions
125806f32e7eSjoerg 129996 bitcodewriter   - Number of bitcode bytes written
125906f32e7eSjoerg   2817 raise           - Number of insts DCEd or constprop'd
126006f32e7eSjoerg   3213 raise           - Number of cast-of-self removed
126106f32e7eSjoerg   5046 raise           - Number of expression trees converted
126206f32e7eSjoerg     75 raise           - Number of other getelementptr's formed
126306f32e7eSjoerg    138 raise           - Number of load/store peepholes
126406f32e7eSjoerg     42 deadtypeelim    - Number of unused typenames removed from symtab
126506f32e7eSjoerg    392 funcresolve     - Number of varargs functions resolved
126606f32e7eSjoerg     27 globaldce       - Number of global variables removed
126706f32e7eSjoerg      2 adce            - Number of basic blocks removed
126806f32e7eSjoerg    134 cee             - Number of branches revectored
126906f32e7eSjoerg     49 cee             - Number of setcc instruction eliminated
127006f32e7eSjoerg    532 gcse            - Number of loads removed
127106f32e7eSjoerg   2919 gcse            - Number of instructions removed
127206f32e7eSjoerg     86 indvars         - Number of canonical indvars added
127306f32e7eSjoerg     87 indvars         - Number of aux indvars removed
127406f32e7eSjoerg     25 instcombine     - Number of dead inst eliminate
127506f32e7eSjoerg    434 instcombine     - Number of insts combined
127606f32e7eSjoerg    248 licm            - Number of load insts hoisted
127706f32e7eSjoerg   1298 licm            - Number of insts hoisted to a loop pre-header
127806f32e7eSjoerg      3 licm            - Number of insts hoisted to multiple loop preds (bad, no loop pre-header)
127906f32e7eSjoerg     75 mem2reg         - Number of alloca's promoted
128006f32e7eSjoerg   1444 cfgsimplify     - Number of blocks simplified
128106f32e7eSjoerg
128206f32e7eSjoergObviously, with so many optimizations, having a unified framework for this stuff
128306f32e7eSjoergis very nice.  Making your pass fit well into the framework makes it more
128406f32e7eSjoergmaintainable and useful.
128506f32e7eSjoerg
128606f32e7eSjoerg.. _DebugCounters:
128706f32e7eSjoerg
128806f32e7eSjoergAdding debug counters to aid in debugging your code
128906f32e7eSjoerg---------------------------------------------------
129006f32e7eSjoerg
129106f32e7eSjoergSometimes, when writing new passes, or trying to track down bugs, it
129206f32e7eSjoergis useful to be able to control whether certain things in your pass
129306f32e7eSjoerghappen or not.  For example, there are times the minimization tooling
129406f32e7eSjoergcan only easily give you large testcases.  You would like to narrow
129506f32e7eSjoergyour bug down to a specific transformation happening or not happening,
129606f32e7eSjoergautomatically, using bisection.  This is where debug counters help.
129706f32e7eSjoergThey provide a framework for making parts of your code only execute a
129806f32e7eSjoergcertain number of times.
129906f32e7eSjoerg
130006f32e7eSjoergThe ``llvm/Support/DebugCounter.h`` (`doxygen
1301*da58b97aSjoerg<https://llvm.org/doxygen/DebugCounter_8h_source.html>`__) file
130206f32e7eSjoergprovides a class named ``DebugCounter`` that can be used to create
130306f32e7eSjoergcommand line counter options that control execution of parts of your code.
130406f32e7eSjoerg
130506f32e7eSjoergDefine your DebugCounter like this:
130606f32e7eSjoerg
130706f32e7eSjoerg.. code-block:: c++
130806f32e7eSjoerg
130906f32e7eSjoerg  DEBUG_COUNTER(DeleteAnInstruction, "passname-delete-instruction",
131006f32e7eSjoerg		"Controls which instructions get delete");
131106f32e7eSjoerg
131206f32e7eSjoergThe ``DEBUG_COUNTER`` macro defines a static variable, whose name
131306f32e7eSjoergis specified by the first argument.  The name of the counter
131406f32e7eSjoerg(which is used on the command line) is specified by the second
131506f32e7eSjoergargument, and the description used in the help is specified by the
131606f32e7eSjoergthird argument.
131706f32e7eSjoerg
131806f32e7eSjoergWhatever code you want that control, use ``DebugCounter::shouldExecute`` to control it.
131906f32e7eSjoerg
132006f32e7eSjoerg.. code-block:: c++
132106f32e7eSjoerg
132206f32e7eSjoerg  if (DebugCounter::shouldExecute(DeleteAnInstruction))
132306f32e7eSjoerg    I->eraseFromParent();
132406f32e7eSjoerg
132506f32e7eSjoergThat's all you have to do.  Now, using opt, you can control when this code triggers using
132606f32e7eSjoergthe '``--debug-counter``' option.  There are two counters provided, ``skip`` and ``count``.
132706f32e7eSjoerg``skip`` is the number of times to skip execution of the codepath.  ``count`` is the number
132806f32e7eSjoergof times, once we are done skipping, to execute the codepath.
132906f32e7eSjoerg
133006f32e7eSjoerg.. code-block:: none
133106f32e7eSjoerg
133206f32e7eSjoerg  $ opt --debug-counter=passname-delete-instruction-skip=1,passname-delete-instruction-count=2 -passname
133306f32e7eSjoerg
133406f32e7eSjoergThis will skip the above code the first time we hit it, then execute it twice, then skip the rest of the executions.
133506f32e7eSjoerg
133606f32e7eSjoergSo if executed on the following code:
133706f32e7eSjoerg
133806f32e7eSjoerg.. code-block:: llvm
133906f32e7eSjoerg
134006f32e7eSjoerg  %1 = add i32 %a, %b
134106f32e7eSjoerg  %2 = add i32 %a, %b
134206f32e7eSjoerg  %3 = add i32 %a, %b
134306f32e7eSjoerg  %4 = add i32 %a, %b
134406f32e7eSjoerg
134506f32e7eSjoergIt would delete number ``%2`` and ``%3``.
134606f32e7eSjoerg
134706f32e7eSjoergA utility is provided in `utils/bisect-skip-count` to binary search
134806f32e7eSjoergskip and count arguments. It can be used to automatically minimize the
134906f32e7eSjoergskip and count for a debug-counter variable.
135006f32e7eSjoerg
135106f32e7eSjoerg.. _ViewGraph:
135206f32e7eSjoerg
135306f32e7eSjoergViewing graphs while debugging code
135406f32e7eSjoerg-----------------------------------
135506f32e7eSjoerg
135606f32e7eSjoergSeveral of the important data structures in LLVM are graphs: for example CFGs
135706f32e7eSjoergmade out of LLVM :ref:`BasicBlocks <BasicBlock>`, CFGs made out of LLVM
135806f32e7eSjoerg:ref:`MachineBasicBlocks <MachineBasicBlock>`, and :ref:`Instruction Selection
135906f32e7eSjoergDAGs <SelectionDAG>`.  In many cases, while debugging various parts of the
136006f32e7eSjoergcompiler, it is nice to instantly visualize these graphs.
136106f32e7eSjoerg
136206f32e7eSjoergLLVM provides several callbacks that are available in a debug build to do
136306f32e7eSjoergexactly that.  If you call the ``Function::viewCFG()`` method, for example, the
136406f32e7eSjoergcurrent LLVM tool will pop up a window containing the CFG for the function where
136506f32e7eSjoergeach basic block is a node in the graph, and each node contains the instructions
136606f32e7eSjoergin the block.  Similarly, there also exists ``Function::viewCFGOnly()`` (does
136706f32e7eSjoergnot include the instructions), the ``MachineFunction::viewCFG()`` and
136806f32e7eSjoerg``MachineFunction::viewCFGOnly()``, and the ``SelectionDAG::viewGraph()``
136906f32e7eSjoergmethods.  Within GDB, for example, you can usually use something like ``call
137006f32e7eSjoergDAG.viewGraph()`` to pop up a window.  Alternatively, you can sprinkle calls to
137106f32e7eSjoergthese functions in your code in places you want to debug.
137206f32e7eSjoerg
137306f32e7eSjoergGetting this to work requires a small amount of setup.  On Unix systems
137406f32e7eSjoergwith X11, install the `graphviz <http://www.graphviz.org>`_ toolkit, and make
137506f32e7eSjoergsure 'dot' and 'gv' are in your path.  If you are running on macOS, download
137606f32e7eSjoergand install the macOS `Graphviz program
137706f32e7eSjoerg<http://www.pixelglow.com/graphviz/>`_ and add
137806f32e7eSjoerg``/Applications/Graphviz.app/Contents/MacOS/`` (or wherever you install it) to
137906f32e7eSjoergyour path. The programs need not be present when configuring, building or
138006f32e7eSjoergrunning LLVM and can simply be installed when needed during an active debug
138106f32e7eSjoergsession.
138206f32e7eSjoerg
138306f32e7eSjoerg``SelectionDAG`` has been extended to make it easier to locate *interesting*
138406f32e7eSjoergnodes in large complex graphs.  From gdb, if you ``call DAG.setGraphColor(node,
138506f32e7eSjoerg"color")``, then the next ``call DAG.viewGraph()`` would highlight the node in
138606f32e7eSjoergthe specified color (choices of colors can be found at `colors
138706f32e7eSjoerg<http://www.graphviz.org/doc/info/colors.html>`_.) More complex node attributes
138806f32e7eSjoergcan be provided with ``call DAG.setGraphAttrs(node, "attributes")`` (choices can
138906f32e7eSjoergbe found at `Graph attributes <http://www.graphviz.org/doc/info/attrs.html>`_.)
139006f32e7eSjoergIf you want to restart and clear all the current graph attributes, then you can
139106f32e7eSjoerg``call DAG.clearGraphAttrs()``.
139206f32e7eSjoerg
139306f32e7eSjoergNote that graph visualization features are compiled out of Release builds to
139406f32e7eSjoergreduce file size.  This means that you need a Debug+Asserts or Release+Asserts
139506f32e7eSjoergbuild to use these features.
139606f32e7eSjoerg
139706f32e7eSjoerg.. _datastructure:
139806f32e7eSjoerg
139906f32e7eSjoergPicking the Right Data Structure for a Task
140006f32e7eSjoerg===========================================
140106f32e7eSjoerg
140206f32e7eSjoergLLVM has a plethora of data structures in the ``llvm/ADT/`` directory, and we
140306f32e7eSjoergcommonly use STL data structures.  This section describes the trade-offs you
140406f32e7eSjoergshould consider when you pick one.
140506f32e7eSjoerg
140606f32e7eSjoergThe first step is a choose your own adventure: do you want a sequential
140706f32e7eSjoergcontainer, a set-like container, or a map-like container?  The most important
140806f32e7eSjoergthing when choosing a container is the algorithmic properties of how you plan to
140906f32e7eSjoergaccess the container.  Based on that, you should use:
141006f32e7eSjoerg
141106f32e7eSjoerg
141206f32e7eSjoerg* a :ref:`map-like <ds_map>` container if you need efficient look-up of a
141306f32e7eSjoerg  value based on another value.  Map-like containers also support efficient
141406f32e7eSjoerg  queries for containment (whether a key is in the map).  Map-like containers
141506f32e7eSjoerg  generally do not support efficient reverse mapping (values to keys).  If you
141606f32e7eSjoerg  need that, use two maps.  Some map-like containers also support efficient
141706f32e7eSjoerg  iteration through the keys in sorted order.  Map-like containers are the most
141806f32e7eSjoerg  expensive sort, only use them if you need one of these capabilities.
141906f32e7eSjoerg
142006f32e7eSjoerg* a :ref:`set-like <ds_set>` container if you need to put a bunch of stuff into
142106f32e7eSjoerg  a container that automatically eliminates duplicates.  Some set-like
142206f32e7eSjoerg  containers support efficient iteration through the elements in sorted order.
142306f32e7eSjoerg  Set-like containers are more expensive than sequential containers.
142406f32e7eSjoerg
142506f32e7eSjoerg* a :ref:`sequential <ds_sequential>` container provides the most efficient way
142606f32e7eSjoerg  to add elements and keeps track of the order they are added to the collection.
142706f32e7eSjoerg  They permit duplicates and support efficient iteration, but do not support
142806f32e7eSjoerg  efficient look-up based on a key.
142906f32e7eSjoerg
143006f32e7eSjoerg* a :ref:`string <ds_string>` container is a specialized sequential container or
143106f32e7eSjoerg  reference structure that is used for character or byte arrays.
143206f32e7eSjoerg
143306f32e7eSjoerg* a :ref:`bit <ds_bit>` container provides an efficient way to store and
143406f32e7eSjoerg  perform set operations on sets of numeric id's, while automatically
143506f32e7eSjoerg  eliminating duplicates.  Bit containers require a maximum of 1 bit for each
143606f32e7eSjoerg  identifier you want to store.
143706f32e7eSjoerg
143806f32e7eSjoergOnce the proper category of container is determined, you can fine tune the
143906f32e7eSjoergmemory use, constant factors, and cache behaviors of access by intelligently
144006f32e7eSjoergpicking a member of the category.  Note that constant factors and cache behavior
144106f32e7eSjoergcan be a big deal.  If you have a vector that usually only contains a few
144206f32e7eSjoergelements (but could contain many), for example, it's much better to use
144306f32e7eSjoerg:ref:`SmallVector <dss_smallvector>` than :ref:`vector <dss_vector>`.  Doing so
144406f32e7eSjoergavoids (relatively) expensive malloc/free calls, which dwarf the cost of adding
144506f32e7eSjoergthe elements to the container.
144606f32e7eSjoerg
144706f32e7eSjoerg.. _ds_sequential:
144806f32e7eSjoerg
144906f32e7eSjoergSequential Containers (std::vector, std::list, etc)
145006f32e7eSjoerg---------------------------------------------------
145106f32e7eSjoerg
145206f32e7eSjoergThere are a variety of sequential containers available for you, based on your
145306f32e7eSjoergneeds.  Pick the first in this section that will do what you want.
145406f32e7eSjoerg
145506f32e7eSjoerg.. _dss_arrayref:
145606f32e7eSjoerg
145706f32e7eSjoergllvm/ADT/ArrayRef.h
145806f32e7eSjoerg^^^^^^^^^^^^^^^^^^^
145906f32e7eSjoerg
146006f32e7eSjoergThe ``llvm::ArrayRef`` class is the preferred class to use in an interface that
146106f32e7eSjoergaccepts a sequential list of elements in memory and just reads from them.  By
146206f32e7eSjoergtaking an ``ArrayRef``, the API can be passed a fixed size array, an
146306f32e7eSjoerg``std::vector``, an ``llvm::SmallVector`` and anything else that is contiguous
146406f32e7eSjoergin memory.
146506f32e7eSjoerg
146606f32e7eSjoerg.. _dss_fixedarrays:
146706f32e7eSjoerg
146806f32e7eSjoergFixed Size Arrays
146906f32e7eSjoerg^^^^^^^^^^^^^^^^^
147006f32e7eSjoerg
147106f32e7eSjoergFixed size arrays are very simple and very fast.  They are good if you know
147206f32e7eSjoergexactly how many elements you have, or you have a (low) upper bound on how many
147306f32e7eSjoergyou have.
147406f32e7eSjoerg
147506f32e7eSjoerg.. _dss_heaparrays:
147606f32e7eSjoerg
147706f32e7eSjoergHeap Allocated Arrays
147806f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^
147906f32e7eSjoerg
148006f32e7eSjoergHeap allocated arrays (``new[]`` + ``delete[]``) are also simple.  They are good
148106f32e7eSjoergif the number of elements is variable, if you know how many elements you will
148206f32e7eSjoergneed before the array is allocated, and if the array is usually large (if not,
148306f32e7eSjoergconsider a :ref:`SmallVector <dss_smallvector>`).  The cost of a heap allocated
148406f32e7eSjoergarray is the cost of the new/delete (aka malloc/free).  Also note that if you
148506f32e7eSjoergare allocating an array of a type with a constructor, the constructor and
148606f32e7eSjoergdestructors will be run for every element in the array (re-sizable vectors only
148706f32e7eSjoergconstruct those elements actually used).
148806f32e7eSjoerg
148906f32e7eSjoerg.. _dss_tinyptrvector:
149006f32e7eSjoerg
149106f32e7eSjoergllvm/ADT/TinyPtrVector.h
149206f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^
149306f32e7eSjoerg
149406f32e7eSjoerg``TinyPtrVector<Type>`` is a highly specialized collection class that is
149506f32e7eSjoergoptimized to avoid allocation in the case when a vector has zero or one
149606f32e7eSjoergelements.  It has two major restrictions: 1) it can only hold values of pointer
149706f32e7eSjoergtype, and 2) it cannot hold a null pointer.
149806f32e7eSjoerg
149906f32e7eSjoergSince this container is highly specialized, it is rarely used.
150006f32e7eSjoerg
150106f32e7eSjoerg.. _dss_smallvector:
150206f32e7eSjoerg
150306f32e7eSjoergllvm/ADT/SmallVector.h
150406f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^
150506f32e7eSjoerg
150606f32e7eSjoerg``SmallVector<Type, N>`` is a simple class that looks and smells just like
150706f32e7eSjoerg``vector<Type>``: it supports efficient iteration, lays out elements in memory
150806f32e7eSjoergorder (so you can do pointer arithmetic between elements), supports efficient
150906f32e7eSjoergpush_back/pop_back operations, supports efficient random access to its elements,
151006f32e7eSjoergetc.
151106f32e7eSjoerg
151206f32e7eSjoergThe main advantage of SmallVector is that it allocates space for some number of
151306f32e7eSjoergelements (N) **in the object itself**.  Because of this, if the SmallVector is
151406f32e7eSjoergdynamically smaller than N, no malloc is performed.  This can be a big win in
151506f32e7eSjoergcases where the malloc/free call is far more expensive than the code that
151606f32e7eSjoergfiddles around with the elements.
151706f32e7eSjoerg
151806f32e7eSjoergThis is good for vectors that are "usually small" (e.g. the number of
151906f32e7eSjoergpredecessors/successors of a block is usually less than 8).  On the other hand,
152006f32e7eSjoergthis makes the size of the SmallVector itself large, so you don't want to
152106f32e7eSjoergallocate lots of them (doing so will waste a lot of space).  As such,
152206f32e7eSjoergSmallVectors are most useful when on the stack.
152306f32e7eSjoerg
1524*da58b97aSjoergIn the absence of a well-motivated choice for the number of
1525*da58b97aSjoerginlined elements ``N``, it is recommended to use ``SmallVector<T>`` (that is,
1526*da58b97aSjoergomitting the ``N``). This will choose a default number of
1527*da58b97aSjoerginlined elements reasonable for allocation on the stack (for example, trying
1528*da58b97aSjoergto keep ``sizeof(SmallVector<T>)`` around 64 bytes).
1529*da58b97aSjoerg
153006f32e7eSjoergSmallVector also provides a nice portable and efficient replacement for
153106f32e7eSjoerg``alloca``.
153206f32e7eSjoerg
153306f32e7eSjoergSmallVector has grown a few other minor advantages over std::vector, causing
153406f32e7eSjoerg``SmallVector<Type, 0>`` to be preferred over ``std::vector<Type>``.
153506f32e7eSjoerg
153606f32e7eSjoerg#. std::vector is exception-safe, and some implementations have pessimizations
153706f32e7eSjoerg   that copy elements when SmallVector would move them.
153806f32e7eSjoerg
1539*da58b97aSjoerg#. SmallVector understands ``std::is_trivially_copyable<Type>`` and uses realloc aggressively.
154006f32e7eSjoerg
154106f32e7eSjoerg#. Many LLVM APIs take a SmallVectorImpl as an out parameter (see the note
154206f32e7eSjoerg   below).
154306f32e7eSjoerg
154406f32e7eSjoerg#. SmallVector with N equal to 0 is smaller than std::vector on 64-bit
154506f32e7eSjoerg   platforms, since it uses ``unsigned`` (instead of ``void*``) for its size
154606f32e7eSjoerg   and capacity.
154706f32e7eSjoerg
154806f32e7eSjoerg.. note::
154906f32e7eSjoerg
1550*da58b97aSjoerg   Prefer to use ``ArrayRef<T>`` or ``SmallVectorImpl<T>`` as a parameter type.
155106f32e7eSjoerg
1552*da58b97aSjoerg   It's rarely appropriate to use ``SmallVector<T, N>`` as a parameter type.
1553*da58b97aSjoerg   If an API only reads from the vector, it should use :ref:`ArrayRef
1554*da58b97aSjoerg   <dss_arrayref>`.  Even if an API updates the vector the "small size" is
1555*da58b97aSjoerg   unlikely to be relevant; such an API should use the ``SmallVectorImpl<T>``
1556*da58b97aSjoerg   class, which is the "vector header" (and methods) without the elements
1557*da58b97aSjoerg   allocated after it. Note that ``SmallVector<T, N>`` inherits from
1558*da58b97aSjoerg   ``SmallVectorImpl<T>`` so the conversion is implicit and costs nothing. E.g.
155906f32e7eSjoerg
156006f32e7eSjoerg   .. code-block:: c++
156106f32e7eSjoerg
1562*da58b97aSjoerg      // DISCOURAGED: Clients cannot pass e.g. raw arrays.
1563*da58b97aSjoerg      hardcodedContiguousStorage(const SmallVectorImpl<Foo> &In);
1564*da58b97aSjoerg      // ENCOURAGED: Clients can pass any contiguous storage of Foo.
1565*da58b97aSjoerg      allowsAnyContiguousStorage(ArrayRef<Foo> In);
1566*da58b97aSjoerg
1567*da58b97aSjoerg      void someFunc1() {
1568*da58b97aSjoerg        Foo Vec[] = { /* ... */ };
1569*da58b97aSjoerg        hardcodedContiguousStorage(Vec); // Error.
1570*da58b97aSjoerg        allowsAnyContiguousStorage(Vec); // Works.
1571*da58b97aSjoerg      }
1572*da58b97aSjoerg
1573*da58b97aSjoerg      // DISCOURAGED: Clients cannot pass e.g. SmallVector<Foo, 8>.
157406f32e7eSjoerg      hardcodedSmallSize(SmallVector<Foo, 2> &Out);
1575*da58b97aSjoerg      // ENCOURAGED: Clients can pass any SmallVector<Foo, N>.
157606f32e7eSjoerg      allowsAnySmallSize(SmallVectorImpl<Foo> &Out);
157706f32e7eSjoerg
1578*da58b97aSjoerg      void someFunc2() {
157906f32e7eSjoerg        SmallVector<Foo, 8> Vec;
158006f32e7eSjoerg        hardcodedSmallSize(Vec); // Error.
158106f32e7eSjoerg        allowsAnySmallSize(Vec); // Works.
158206f32e7eSjoerg      }
158306f32e7eSjoerg
1584*da58b97aSjoerg   Even though it has "``Impl``" in the name, SmallVectorImpl is widely used
1585*da58b97aSjoerg   and is no longer "private to the implementation". A name like
1586*da58b97aSjoerg   ``SmallVectorHeader`` might be more appropriate.
158706f32e7eSjoerg
158806f32e7eSjoerg.. _dss_vector:
158906f32e7eSjoerg
159006f32e7eSjoerg<vector>
159106f32e7eSjoerg^^^^^^^^
159206f32e7eSjoerg
159306f32e7eSjoerg``std::vector<T>`` is well loved and respected.  However, ``SmallVector<T, 0>``
159406f32e7eSjoergis often a better option due to the advantages listed above.  std::vector is
159506f32e7eSjoergstill useful when you need to store more than ``UINT32_MAX`` elements or when
159606f32e7eSjoerginterfacing with code that expects vectors :).
159706f32e7eSjoerg
159806f32e7eSjoergOne worthwhile note about std::vector: avoid code like this:
159906f32e7eSjoerg
160006f32e7eSjoerg.. code-block:: c++
160106f32e7eSjoerg
160206f32e7eSjoerg  for ( ... ) {
160306f32e7eSjoerg     std::vector<foo> V;
160406f32e7eSjoerg     // make use of V.
160506f32e7eSjoerg  }
160606f32e7eSjoerg
160706f32e7eSjoergInstead, write this as:
160806f32e7eSjoerg
160906f32e7eSjoerg.. code-block:: c++
161006f32e7eSjoerg
161106f32e7eSjoerg  std::vector<foo> V;
161206f32e7eSjoerg  for ( ... ) {
161306f32e7eSjoerg     // make use of V.
161406f32e7eSjoerg     V.clear();
161506f32e7eSjoerg  }
161606f32e7eSjoerg
161706f32e7eSjoergDoing so will save (at least) one heap allocation and free per iteration of the
161806f32e7eSjoergloop.
161906f32e7eSjoerg
162006f32e7eSjoerg.. _dss_deque:
162106f32e7eSjoerg
162206f32e7eSjoerg<deque>
162306f32e7eSjoerg^^^^^^^
162406f32e7eSjoerg
162506f32e7eSjoerg``std::deque`` is, in some senses, a generalized version of ``std::vector``.
162606f32e7eSjoergLike ``std::vector``, it provides constant time random access and other similar
162706f32e7eSjoergproperties, but it also provides efficient access to the front of the list.  It
162806f32e7eSjoergdoes not guarantee continuity of elements within memory.
162906f32e7eSjoerg
163006f32e7eSjoergIn exchange for this extra flexibility, ``std::deque`` has significantly higher
163106f32e7eSjoergconstant factor costs than ``std::vector``.  If possible, use ``std::vector`` or
163206f32e7eSjoergsomething cheaper.
163306f32e7eSjoerg
163406f32e7eSjoerg.. _dss_list:
163506f32e7eSjoerg
163606f32e7eSjoerg<list>
163706f32e7eSjoerg^^^^^^
163806f32e7eSjoerg
163906f32e7eSjoerg``std::list`` is an extremely inefficient class that is rarely useful.  It
164006f32e7eSjoergperforms a heap allocation for every element inserted into it, thus having an
164106f32e7eSjoergextremely high constant factor, particularly for small data types.
164206f32e7eSjoerg``std::list`` also only supports bidirectional iteration, not random access
164306f32e7eSjoergiteration.
164406f32e7eSjoerg
164506f32e7eSjoergIn exchange for this high cost, std::list supports efficient access to both ends
164606f32e7eSjoergof the list (like ``std::deque``, but unlike ``std::vector`` or
164706f32e7eSjoerg``SmallVector``).  In addition, the iterator invalidation characteristics of
164806f32e7eSjoergstd::list are stronger than that of a vector class: inserting or removing an
164906f32e7eSjoergelement into the list does not invalidate iterator or pointers to other elements
165006f32e7eSjoergin the list.
165106f32e7eSjoerg
165206f32e7eSjoerg.. _dss_ilist:
165306f32e7eSjoerg
165406f32e7eSjoergllvm/ADT/ilist.h
165506f32e7eSjoerg^^^^^^^^^^^^^^^^
165606f32e7eSjoerg
165706f32e7eSjoerg``ilist<T>`` implements an 'intrusive' doubly-linked list.  It is intrusive,
165806f32e7eSjoergbecause it requires the element to store and provide access to the prev/next
165906f32e7eSjoergpointers for the list.
166006f32e7eSjoerg
166106f32e7eSjoerg``ilist`` has the same drawbacks as ``std::list``, and additionally requires an
166206f32e7eSjoerg``ilist_traits`` implementation for the element type, but it provides some novel
166306f32e7eSjoergcharacteristics.  In particular, it can efficiently store polymorphic objects,
166406f32e7eSjoergthe traits class is informed when an element is inserted or removed from the
166506f32e7eSjoerglist, and ``ilist``\ s are guaranteed to support a constant-time splice
166606f32e7eSjoergoperation.
166706f32e7eSjoerg
166806f32e7eSjoergThese properties are exactly what we want for things like ``Instruction``\ s and
166906f32e7eSjoergbasic blocks, which is why these are implemented with ``ilist``\ s.
167006f32e7eSjoerg
167106f32e7eSjoergRelated classes of interest are explained in the following subsections:
167206f32e7eSjoerg
167306f32e7eSjoerg* :ref:`ilist_traits <dss_ilist_traits>`
167406f32e7eSjoerg
167506f32e7eSjoerg* :ref:`iplist <dss_iplist>`
167606f32e7eSjoerg
167706f32e7eSjoerg* :ref:`llvm/ADT/ilist_node.h <dss_ilist_node>`
167806f32e7eSjoerg
167906f32e7eSjoerg* :ref:`Sentinels <dss_ilist_sentinel>`
168006f32e7eSjoerg
168106f32e7eSjoerg.. _dss_packedvector:
168206f32e7eSjoerg
168306f32e7eSjoergllvm/ADT/PackedVector.h
168406f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^
168506f32e7eSjoerg
168606f32e7eSjoergUseful for storing a vector of values using only a few number of bits for each
168706f32e7eSjoergvalue.  Apart from the standard operations of a vector-like container, it can
168806f32e7eSjoergalso perform an 'or' set operation.
168906f32e7eSjoerg
169006f32e7eSjoergFor example:
169106f32e7eSjoerg
169206f32e7eSjoerg.. code-block:: c++
169306f32e7eSjoerg
169406f32e7eSjoerg  enum State {
169506f32e7eSjoerg      None = 0x0,
169606f32e7eSjoerg      FirstCondition = 0x1,
169706f32e7eSjoerg      SecondCondition = 0x2,
169806f32e7eSjoerg      Both = 0x3
169906f32e7eSjoerg  };
170006f32e7eSjoerg
170106f32e7eSjoerg  State get() {
170206f32e7eSjoerg      PackedVector<State, 2> Vec1;
170306f32e7eSjoerg      Vec1.push_back(FirstCondition);
170406f32e7eSjoerg
170506f32e7eSjoerg      PackedVector<State, 2> Vec2;
170606f32e7eSjoerg      Vec2.push_back(SecondCondition);
170706f32e7eSjoerg
170806f32e7eSjoerg      Vec1 |= Vec2;
170906f32e7eSjoerg      return Vec1[0]; // returns 'Both'.
171006f32e7eSjoerg  }
171106f32e7eSjoerg
171206f32e7eSjoerg.. _dss_ilist_traits:
171306f32e7eSjoerg
171406f32e7eSjoergilist_traits
171506f32e7eSjoerg^^^^^^^^^^^^
171606f32e7eSjoerg
171706f32e7eSjoerg``ilist_traits<T>`` is ``ilist<T>``'s customization mechanism. ``iplist<T>``
171806f32e7eSjoerg(and consequently ``ilist<T>``) publicly derive from this traits class.
171906f32e7eSjoerg
172006f32e7eSjoerg.. _dss_iplist:
172106f32e7eSjoerg
172206f32e7eSjoergiplist
172306f32e7eSjoerg^^^^^^
172406f32e7eSjoerg
172506f32e7eSjoerg``iplist<T>`` is ``ilist<T>``'s base and as such supports a slightly narrower
172606f32e7eSjoerginterface.  Notably, inserters from ``T&`` are absent.
172706f32e7eSjoerg
172806f32e7eSjoerg``ilist_traits<T>`` is a public base of this class and can be used for a wide
172906f32e7eSjoergvariety of customizations.
173006f32e7eSjoerg
173106f32e7eSjoerg.. _dss_ilist_node:
173206f32e7eSjoerg
173306f32e7eSjoergllvm/ADT/ilist_node.h
173406f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^
173506f32e7eSjoerg
173606f32e7eSjoerg``ilist_node<T>`` implements the forward and backward links that are expected
173706f32e7eSjoergby the ``ilist<T>`` (and analogous containers) in the default manner.
173806f32e7eSjoerg
173906f32e7eSjoerg``ilist_node<T>``\ s are meant to be embedded in the node type ``T``, usually
174006f32e7eSjoerg``T`` publicly derives from ``ilist_node<T>``.
174106f32e7eSjoerg
174206f32e7eSjoerg.. _dss_ilist_sentinel:
174306f32e7eSjoerg
174406f32e7eSjoergSentinels
174506f32e7eSjoerg^^^^^^^^^
174606f32e7eSjoerg
174706f32e7eSjoerg``ilist``\ s have another specialty that must be considered.  To be a good
174806f32e7eSjoergcitizen in the C++ ecosystem, it needs to support the standard container
174906f32e7eSjoergoperations, such as ``begin`` and ``end`` iterators, etc.  Also, the
175006f32e7eSjoerg``operator--`` must work correctly on the ``end`` iterator in the case of
175106f32e7eSjoergnon-empty ``ilist``\ s.
175206f32e7eSjoerg
175306f32e7eSjoergThe only sensible solution to this problem is to allocate a so-called *sentinel*
175406f32e7eSjoergalong with the intrusive list, which serves as the ``end`` iterator, providing
175506f32e7eSjoergthe back-link to the last element.  However conforming to the C++ convention it
175606f32e7eSjoergis illegal to ``operator++`` beyond the sentinel and it also must not be
175706f32e7eSjoergdereferenced.
175806f32e7eSjoerg
175906f32e7eSjoergThese constraints allow for some implementation freedom to the ``ilist`` how to
176006f32e7eSjoergallocate and store the sentinel.  The corresponding policy is dictated by
176106f32e7eSjoerg``ilist_traits<T>``.  By default a ``T`` gets heap-allocated whenever the need
176206f32e7eSjoergfor a sentinel arises.
176306f32e7eSjoerg
176406f32e7eSjoergWhile the default policy is sufficient in most cases, it may break down when
176506f32e7eSjoerg``T`` does not provide a default constructor.  Also, in the case of many
176606f32e7eSjoerginstances of ``ilist``\ s, the memory overhead of the associated sentinels is
176706f32e7eSjoergwasted.  To alleviate the situation with numerous and voluminous
176806f32e7eSjoerg``T``-sentinels, sometimes a trick is employed, leading to *ghostly sentinels*.
176906f32e7eSjoerg
177006f32e7eSjoergGhostly sentinels are obtained by specially-crafted ``ilist_traits<T>`` which
177106f32e7eSjoergsuperpose the sentinel with the ``ilist`` instance in memory.  Pointer
177206f32e7eSjoergarithmetic is used to obtain the sentinel, which is relative to the ``ilist``'s
177306f32e7eSjoerg``this`` pointer.  The ``ilist`` is augmented by an extra pointer, which serves
177406f32e7eSjoergas the back-link of the sentinel.  This is the only field in the ghostly
177506f32e7eSjoergsentinel which can be legally accessed.
177606f32e7eSjoerg
177706f32e7eSjoerg.. _dss_other:
177806f32e7eSjoerg
177906f32e7eSjoergOther Sequential Container options
178006f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
178106f32e7eSjoerg
178206f32e7eSjoergOther STL containers are available, such as ``std::string``.
178306f32e7eSjoerg
178406f32e7eSjoergThere are also various STL adapter classes such as ``std::queue``,
178506f32e7eSjoerg``std::priority_queue``, ``std::stack``, etc.  These provide simplified access
178606f32e7eSjoergto an underlying container but don't affect the cost of the container itself.
178706f32e7eSjoerg
178806f32e7eSjoerg.. _ds_string:
178906f32e7eSjoerg
179006f32e7eSjoergString-like containers
179106f32e7eSjoerg----------------------
179206f32e7eSjoerg
179306f32e7eSjoergThere are a variety of ways to pass around and use strings in C and C++, and
179406f32e7eSjoergLLVM adds a few new options to choose from.  Pick the first option on this list
179506f32e7eSjoergthat will do what you need, they are ordered according to their relative cost.
179606f32e7eSjoerg
179706f32e7eSjoergNote that it is generally preferred to *not* pass strings around as ``const
179806f32e7eSjoergchar*``'s.  These have a number of problems, including the fact that they
179906f32e7eSjoergcannot represent embedded nul ("\0") characters, and do not have a length
180006f32e7eSjoergavailable efficiently.  The general replacement for '``const char*``' is
180106f32e7eSjoergStringRef.
180206f32e7eSjoerg
180306f32e7eSjoergFor more information on choosing string containers for APIs, please see
180406f32e7eSjoerg:ref:`Passing Strings <string_apis>`.
180506f32e7eSjoerg
180606f32e7eSjoerg.. _dss_stringref:
180706f32e7eSjoerg
180806f32e7eSjoergllvm/ADT/StringRef.h
180906f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^
181006f32e7eSjoerg
181106f32e7eSjoergThe StringRef class is a simple value class that contains a pointer to a
181206f32e7eSjoergcharacter and a length, and is quite related to the :ref:`ArrayRef
181306f32e7eSjoerg<dss_arrayref>` class (but specialized for arrays of characters).  Because
181406f32e7eSjoergStringRef carries a length with it, it safely handles strings with embedded nul
181506f32e7eSjoergcharacters in it, getting the length does not require a strlen call, and it even
181606f32e7eSjoerghas very convenient APIs for slicing and dicing the character range that it
181706f32e7eSjoergrepresents.
181806f32e7eSjoerg
181906f32e7eSjoergStringRef is ideal for passing simple strings around that are known to be live,
182006f32e7eSjoergeither because they are C string literals, std::string, a C array, or a
182106f32e7eSjoergSmallVector.  Each of these cases has an efficient implicit conversion to
182206f32e7eSjoergStringRef, which doesn't result in a dynamic strlen being executed.
182306f32e7eSjoerg
182406f32e7eSjoergStringRef has a few major limitations which make more powerful string containers
182506f32e7eSjoerguseful:
182606f32e7eSjoerg
182706f32e7eSjoerg#. You cannot directly convert a StringRef to a 'const char*' because there is
182806f32e7eSjoerg   no way to add a trailing nul (unlike the .c_str() method on various stronger
182906f32e7eSjoerg   classes).
183006f32e7eSjoerg
183106f32e7eSjoerg#. StringRef doesn't own or keep alive the underlying string bytes.
183206f32e7eSjoerg   As such it can easily lead to dangling pointers, and is not suitable for
183306f32e7eSjoerg   embedding in datastructures in most cases (instead, use an std::string or
183406f32e7eSjoerg   something like that).
183506f32e7eSjoerg
183606f32e7eSjoerg#. For the same reason, StringRef cannot be used as the return value of a
183706f32e7eSjoerg   method if the method "computes" the result string.  Instead, use std::string.
183806f32e7eSjoerg
183906f32e7eSjoerg#. StringRef's do not allow you to mutate the pointed-to string bytes and it
184006f32e7eSjoerg   doesn't allow you to insert or remove bytes from the range.  For editing
184106f32e7eSjoerg   operations like this, it interoperates with the :ref:`Twine <dss_twine>`
184206f32e7eSjoerg   class.
184306f32e7eSjoerg
184406f32e7eSjoergBecause of its strengths and limitations, it is very common for a function to
184506f32e7eSjoergtake a StringRef and for a method on an object to return a StringRef that points
184606f32e7eSjoerginto some string that it owns.
184706f32e7eSjoerg
184806f32e7eSjoerg.. _dss_twine:
184906f32e7eSjoerg
185006f32e7eSjoergllvm/ADT/Twine.h
185106f32e7eSjoerg^^^^^^^^^^^^^^^^
185206f32e7eSjoerg
185306f32e7eSjoergThe Twine class is used as an intermediary datatype for APIs that want to take a
185406f32e7eSjoergstring that can be constructed inline with a series of concatenations.  Twine
185506f32e7eSjoergworks by forming recursive instances of the Twine datatype (a simple value
185606f32e7eSjoergobject) on the stack as temporary objects, linking them together into a tree
185706f32e7eSjoergwhich is then linearized when the Twine is consumed.  Twine is only safe to use
185806f32e7eSjoergas the argument to a function, and should always be a const reference, e.g.:
185906f32e7eSjoerg
186006f32e7eSjoerg.. code-block:: c++
186106f32e7eSjoerg
186206f32e7eSjoerg  void foo(const Twine &T);
186306f32e7eSjoerg  ...
186406f32e7eSjoerg  StringRef X = ...
186506f32e7eSjoerg  unsigned i = ...
186606f32e7eSjoerg  foo(X + "." + Twine(i));
186706f32e7eSjoerg
186806f32e7eSjoergThis example forms a string like "blarg.42" by concatenating the values
186906f32e7eSjoergtogether, and does not form intermediate strings containing "blarg" or "blarg.".
187006f32e7eSjoerg
187106f32e7eSjoergBecause Twine is constructed with temporary objects on the stack, and because
187206f32e7eSjoergthese instances are destroyed at the end of the current statement, it is an
187306f32e7eSjoerginherently dangerous API.  For example, this simple variant contains undefined
187406f32e7eSjoergbehavior and will probably crash:
187506f32e7eSjoerg
187606f32e7eSjoerg.. code-block:: c++
187706f32e7eSjoerg
187806f32e7eSjoerg  void foo(const Twine &T);
187906f32e7eSjoerg  ...
188006f32e7eSjoerg  StringRef X = ...
188106f32e7eSjoerg  unsigned i = ...
188206f32e7eSjoerg  const Twine &Tmp = X + "." + Twine(i);
188306f32e7eSjoerg  foo(Tmp);
188406f32e7eSjoerg
188506f32e7eSjoerg... because the temporaries are destroyed before the call.  That said, Twine's
188606f32e7eSjoergare much more efficient than intermediate std::string temporaries, and they work
188706f32e7eSjoergreally well with StringRef.  Just be aware of their limitations.
188806f32e7eSjoerg
188906f32e7eSjoerg.. _dss_smallstring:
189006f32e7eSjoerg
189106f32e7eSjoergllvm/ADT/SmallString.h
189206f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^
189306f32e7eSjoerg
189406f32e7eSjoergSmallString is a subclass of :ref:`SmallVector <dss_smallvector>` that adds some
189506f32e7eSjoergconvenience APIs like += that takes StringRef's.  SmallString avoids allocating
189606f32e7eSjoergmemory in the case when the preallocated space is enough to hold its data, and
189706f32e7eSjoergit calls back to general heap allocation when required.  Since it owns its data,
189806f32e7eSjoergit is very safe to use and supports full mutation of the string.
189906f32e7eSjoerg
190006f32e7eSjoergLike SmallVector's, the big downside to SmallString is their sizeof.  While they
190106f32e7eSjoergare optimized for small strings, they themselves are not particularly small.
190206f32e7eSjoergThis means that they work great for temporary scratch buffers on the stack, but
190306f32e7eSjoergshould not generally be put into the heap: it is very rare to see a SmallString
190406f32e7eSjoergas the member of a frequently-allocated heap data structure or returned
190506f32e7eSjoergby-value.
190606f32e7eSjoerg
190706f32e7eSjoerg.. _dss_stdstring:
190806f32e7eSjoerg
190906f32e7eSjoergstd::string
191006f32e7eSjoerg^^^^^^^^^^^
191106f32e7eSjoerg
191206f32e7eSjoergThe standard C++ std::string class is a very general class that (like
191306f32e7eSjoergSmallString) owns its underlying data.  sizeof(std::string) is very reasonable
191406f32e7eSjoergso it can be embedded into heap data structures and returned by-value.  On the
191506f32e7eSjoergother hand, std::string is highly inefficient for inline editing (e.g.
191606f32e7eSjoergconcatenating a bunch of stuff together) and because it is provided by the
191706f32e7eSjoergstandard library, its performance characteristics depend a lot of the host
191806f32e7eSjoergstandard library (e.g. libc++ and MSVC provide a highly optimized string class,
191906f32e7eSjoergGCC contains a really slow implementation).
192006f32e7eSjoerg
192106f32e7eSjoergThe major disadvantage of std::string is that almost every operation that makes
192206f32e7eSjoergthem larger can allocate memory, which is slow.  As such, it is better to use
192306f32e7eSjoergSmallVector or Twine as a scratch buffer, but then use std::string to persist
192406f32e7eSjoergthe result.
192506f32e7eSjoerg
192606f32e7eSjoerg.. _ds_set:
192706f32e7eSjoerg
192806f32e7eSjoergSet-Like Containers (std::set, SmallSet, SetVector, etc)
192906f32e7eSjoerg--------------------------------------------------------
193006f32e7eSjoerg
193106f32e7eSjoergSet-like containers are useful when you need to canonicalize multiple values
193206f32e7eSjoerginto a single representation.  There are several different choices for how to do
193306f32e7eSjoergthis, providing various trade-offs.
193406f32e7eSjoerg
193506f32e7eSjoerg.. _dss_sortedvectorset:
193606f32e7eSjoerg
193706f32e7eSjoergA sorted 'vector'
193806f32e7eSjoerg^^^^^^^^^^^^^^^^^
193906f32e7eSjoerg
194006f32e7eSjoergIf you intend to insert a lot of elements, then do a lot of queries, a great
194106f32e7eSjoergapproach is to use an std::vector (or other sequential container) with
194206f32e7eSjoergstd::sort+std::unique to remove duplicates.  This approach works really well if
194306f32e7eSjoergyour usage pattern has these two distinct phases (insert then query), and can be
194406f32e7eSjoergcoupled with a good choice of :ref:`sequential container <ds_sequential>`.
194506f32e7eSjoerg
194606f32e7eSjoergThis combination provides the several nice properties: the result data is
194706f32e7eSjoergcontiguous in memory (good for cache locality), has few allocations, is easy to
194806f32e7eSjoergaddress (iterators in the final vector are just indices or pointers), and can be
194906f32e7eSjoergefficiently queried with a standard binary search (e.g.
195006f32e7eSjoerg``std::lower_bound``; if you want the whole range of elements comparing
195106f32e7eSjoergequal, use ``std::equal_range``).
195206f32e7eSjoerg
195306f32e7eSjoerg.. _dss_smallset:
195406f32e7eSjoerg
195506f32e7eSjoergllvm/ADT/SmallSet.h
195606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^
195706f32e7eSjoerg
195806f32e7eSjoergIf you have a set-like data structure that is usually small and whose elements
195906f32e7eSjoergare reasonably small, a ``SmallSet<Type, N>`` is a good choice.  This set has
196006f32e7eSjoergspace for N elements in place (thus, if the set is dynamically smaller than N,
196106f32e7eSjoergno malloc traffic is required) and accesses them with a simple linear search.
196206f32e7eSjoergWhen the set grows beyond N elements, it allocates a more expensive
196306f32e7eSjoergrepresentation that guarantees efficient access (for most types, it falls back
196406f32e7eSjoergto :ref:`std::set <dss_set>`, but for pointers it uses something far better,
196506f32e7eSjoerg:ref:`SmallPtrSet <dss_smallptrset>`.
196606f32e7eSjoerg
196706f32e7eSjoergThe magic of this class is that it handles small sets extremely efficiently, but
196806f32e7eSjoerggracefully handles extremely large sets without loss of efficiency.
196906f32e7eSjoerg
197006f32e7eSjoerg.. _dss_smallptrset:
197106f32e7eSjoerg
197206f32e7eSjoergllvm/ADT/SmallPtrSet.h
197306f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^
197406f32e7eSjoerg
197506f32e7eSjoerg``SmallPtrSet`` has all the advantages of ``SmallSet`` (and a ``SmallSet`` of
197606f32e7eSjoergpointers is transparently implemented with a ``SmallPtrSet``). If more than N
197706f32e7eSjoerginsertions are performed, a single quadratically probed hash table is allocated
197806f32e7eSjoergand grows as needed, providing extremely efficient access (constant time
197906f32e7eSjoerginsertion/deleting/queries with low constant factors) and is very stingy with
198006f32e7eSjoergmalloc traffic.
198106f32e7eSjoerg
198206f32e7eSjoergNote that, unlike :ref:`std::set <dss_set>`, the iterators of ``SmallPtrSet``
198306f32e7eSjoergare invalidated whenever an insertion occurs.  Also, the values visited by the
198406f32e7eSjoergiterators are not visited in sorted order.
198506f32e7eSjoerg
198606f32e7eSjoerg.. _dss_stringset:
198706f32e7eSjoerg
198806f32e7eSjoergllvm/ADT/StringSet.h
198906f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^
199006f32e7eSjoerg
199106f32e7eSjoerg``StringSet`` is a thin wrapper around :ref:`StringMap\<char\> <dss_stringmap>`,
199206f32e7eSjoergand it allows efficient storage and retrieval of unique strings.
199306f32e7eSjoerg
199406f32e7eSjoergFunctionally analogous to ``SmallSet<StringRef>``, ``StringSet`` also supports
199506f32e7eSjoergiteration. (The iterator dereferences to a ``StringMapEntry<char>``, so you
199606f32e7eSjoergneed to call ``i->getKey()`` to access the item of the StringSet.)  On the
199706f32e7eSjoergother hand, ``StringSet`` doesn't support range-insertion and
199806f32e7eSjoergcopy-construction, which :ref:`SmallSet <dss_smallset>` and :ref:`SmallPtrSet
199906f32e7eSjoerg<dss_smallptrset>` do support.
200006f32e7eSjoerg
200106f32e7eSjoerg.. _dss_denseset:
200206f32e7eSjoerg
200306f32e7eSjoergllvm/ADT/DenseSet.h
200406f32e7eSjoerg^^^^^^^^^^^^^^^^^^^
200506f32e7eSjoerg
200606f32e7eSjoergDenseSet is a simple quadratically probed hash table.  It excels at supporting
200706f32e7eSjoergsmall values: it uses a single allocation to hold all of the pairs that are
200806f32e7eSjoergcurrently inserted in the set.  DenseSet is a great way to unique small values
200906f32e7eSjoergthat are not simple pointers (use :ref:`SmallPtrSet <dss_smallptrset>` for
201006f32e7eSjoergpointers).  Note that DenseSet has the same requirements for the value type that
201106f32e7eSjoerg:ref:`DenseMap <dss_densemap>` has.
201206f32e7eSjoerg
201306f32e7eSjoerg.. _dss_sparseset:
201406f32e7eSjoerg
201506f32e7eSjoergllvm/ADT/SparseSet.h
201606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^
201706f32e7eSjoerg
201806f32e7eSjoergSparseSet holds a small number of objects identified by unsigned keys of
201906f32e7eSjoergmoderate size.  It uses a lot of memory, but provides operations that are almost
202006f32e7eSjoergas fast as a vector.  Typical keys are physical registers, virtual registers, or
202106f32e7eSjoergnumbered basic blocks.
202206f32e7eSjoerg
202306f32e7eSjoergSparseSet is useful for algorithms that need very fast clear/find/insert/erase
202406f32e7eSjoergand fast iteration over small sets.  It is not intended for building composite
202506f32e7eSjoergdata structures.
202606f32e7eSjoerg
202706f32e7eSjoerg.. _dss_sparsemultiset:
202806f32e7eSjoerg
202906f32e7eSjoergllvm/ADT/SparseMultiSet.h
203006f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^
203106f32e7eSjoerg
203206f32e7eSjoergSparseMultiSet adds multiset behavior to SparseSet, while retaining SparseSet's
203306f32e7eSjoergdesirable attributes. Like SparseSet, it typically uses a lot of memory, but
203406f32e7eSjoergprovides operations that are almost as fast as a vector.  Typical keys are
203506f32e7eSjoergphysical registers, virtual registers, or numbered basic blocks.
203606f32e7eSjoerg
203706f32e7eSjoergSparseMultiSet is useful for algorithms that need very fast
203806f32e7eSjoergclear/find/insert/erase of the entire collection, and iteration over sets of
203906f32e7eSjoergelements sharing a key. It is often a more efficient choice than using composite
204006f32e7eSjoergdata structures (e.g. vector-of-vectors, map-of-vectors). It is not intended for
204106f32e7eSjoergbuilding composite data structures.
204206f32e7eSjoerg
204306f32e7eSjoerg.. _dss_FoldingSet:
204406f32e7eSjoerg
204506f32e7eSjoergllvm/ADT/FoldingSet.h
204606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^
204706f32e7eSjoerg
204806f32e7eSjoergFoldingSet is an aggregate class that is really good at uniquing
204906f32e7eSjoergexpensive-to-create or polymorphic objects.  It is a combination of a chained
205006f32e7eSjoerghash table with intrusive links (uniqued objects are required to inherit from
205106f32e7eSjoergFoldingSetNode) that uses :ref:`SmallVector <dss_smallvector>` as part of its ID
205206f32e7eSjoergprocess.
205306f32e7eSjoerg
205406f32e7eSjoergConsider a case where you want to implement a "getOrCreateFoo" method for a
205506f32e7eSjoergcomplex object (for example, a node in the code generator).  The client has a
205606f32e7eSjoergdescription of **what** it wants to generate (it knows the opcode and all the
205706f32e7eSjoergoperands), but we don't want to 'new' a node, then try inserting it into a set
205806f32e7eSjoergonly to find out it already exists, at which point we would have to delete it
205906f32e7eSjoergand return the node that already exists.
206006f32e7eSjoerg
206106f32e7eSjoergTo support this style of client, FoldingSet perform a query with a
206206f32e7eSjoergFoldingSetNodeID (which wraps SmallVector) that can be used to describe the
206306f32e7eSjoergelement that we want to query for.  The query either returns the element
206406f32e7eSjoergmatching the ID or it returns an opaque ID that indicates where insertion should
206506f32e7eSjoergtake place.  Construction of the ID usually does not require heap traffic.
206606f32e7eSjoerg
206706f32e7eSjoergBecause FoldingSet uses intrusive links, it can support polymorphic objects in
206806f32e7eSjoergthe set (for example, you can have SDNode instances mixed with LoadSDNodes).
206906f32e7eSjoergBecause the elements are individually allocated, pointers to the elements are
207006f32e7eSjoergstable: inserting or removing elements does not invalidate any pointers to other
207106f32e7eSjoergelements.
207206f32e7eSjoerg
207306f32e7eSjoerg.. _dss_set:
207406f32e7eSjoerg
207506f32e7eSjoerg<set>
207606f32e7eSjoerg^^^^^
207706f32e7eSjoerg
207806f32e7eSjoerg``std::set`` is a reasonable all-around set class, which is decent at many
207906f32e7eSjoergthings but great at nothing.  std::set allocates memory for each element
208006f32e7eSjoerginserted (thus it is very malloc intensive) and typically stores three pointers
208106f32e7eSjoergper element in the set (thus adding a large amount of per-element space
208206f32e7eSjoergoverhead).  It offers guaranteed log(n) performance, which is not particularly
208306f32e7eSjoergfast from a complexity standpoint (particularly if the elements of the set are
208406f32e7eSjoergexpensive to compare, like strings), and has extremely high constant factors for
208506f32e7eSjoerglookup, insertion and removal.
208606f32e7eSjoerg
208706f32e7eSjoergThe advantages of std::set are that its iterators are stable (deleting or
208806f32e7eSjoerginserting an element from the set does not affect iterators or pointers to other
208906f32e7eSjoergelements) and that iteration over the set is guaranteed to be in sorted order.
209006f32e7eSjoergIf the elements in the set are large, then the relative overhead of the pointers
209106f32e7eSjoergand malloc traffic is not a big deal, but if the elements of the set are small,
209206f32e7eSjoergstd::set is almost never a good choice.
209306f32e7eSjoerg
209406f32e7eSjoerg.. _dss_setvector:
209506f32e7eSjoerg
209606f32e7eSjoergllvm/ADT/SetVector.h
209706f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^
209806f32e7eSjoerg
209906f32e7eSjoergLLVM's ``SetVector<Type>`` is an adapter class that combines your choice of a
210006f32e7eSjoergset-like container along with a :ref:`Sequential Container <ds_sequential>` The
210106f32e7eSjoergimportant property that this provides is efficient insertion with uniquing
210206f32e7eSjoerg(duplicate elements are ignored) with iteration support.  It implements this by
210306f32e7eSjoerginserting elements into both a set-like container and the sequential container,
210406f32e7eSjoergusing the set-like container for uniquing and the sequential container for
210506f32e7eSjoergiteration.
210606f32e7eSjoerg
210706f32e7eSjoergThe difference between SetVector and other sets is that the order of iteration
210806f32e7eSjoergis guaranteed to match the order of insertion into the SetVector.  This property
210906f32e7eSjoergis really important for things like sets of pointers.  Because pointer values
211006f32e7eSjoergare non-deterministic (e.g. vary across runs of the program on different
211106f32e7eSjoergmachines), iterating over the pointers in the set will not be in a well-defined
211206f32e7eSjoergorder.
211306f32e7eSjoerg
211406f32e7eSjoergThe drawback of SetVector is that it requires twice as much space as a normal
211506f32e7eSjoergset and has the sum of constant factors from the set-like container and the
211606f32e7eSjoergsequential container that it uses.  Use it **only** if you need to iterate over
211706f32e7eSjoergthe elements in a deterministic order.  SetVector is also expensive to delete
211806f32e7eSjoergelements out of (linear time), unless you use its "pop_back" method, which is
211906f32e7eSjoergfaster.
212006f32e7eSjoerg
212106f32e7eSjoerg``SetVector`` is an adapter class that defaults to using ``std::vector`` and a
212206f32e7eSjoergsize 16 ``SmallSet`` for the underlying containers, so it is quite expensive.
212306f32e7eSjoergHowever, ``"llvm/ADT/SetVector.h"`` also provides a ``SmallSetVector`` class,
212406f32e7eSjoergwhich defaults to using a ``SmallVector`` and ``SmallSet`` of a specified size.
212506f32e7eSjoergIf you use this, and if your sets are dynamically smaller than ``N``, you will
212606f32e7eSjoergsave a lot of heap traffic.
212706f32e7eSjoerg
212806f32e7eSjoerg.. _dss_uniquevector:
212906f32e7eSjoerg
213006f32e7eSjoergllvm/ADT/UniqueVector.h
213106f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^
213206f32e7eSjoerg
213306f32e7eSjoergUniqueVector is similar to :ref:`SetVector <dss_setvector>` but it retains a
213406f32e7eSjoergunique ID for each element inserted into the set.  It internally contains a map
213506f32e7eSjoergand a vector, and it assigns a unique ID for each value inserted into the set.
213606f32e7eSjoerg
213706f32e7eSjoergUniqueVector is very expensive: its cost is the sum of the cost of maintaining
213806f32e7eSjoergboth the map and vector, it has high complexity, high constant factors, and
213906f32e7eSjoergproduces a lot of malloc traffic.  It should be avoided.
214006f32e7eSjoerg
214106f32e7eSjoerg.. _dss_immutableset:
214206f32e7eSjoerg
214306f32e7eSjoergllvm/ADT/ImmutableSet.h
214406f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^
214506f32e7eSjoerg
214606f32e7eSjoergImmutableSet is an immutable (functional) set implementation based on an AVL
214706f32e7eSjoergtree.  Adding or removing elements is done through a Factory object and results
214806f32e7eSjoergin the creation of a new ImmutableSet object.  If an ImmutableSet already exists
214906f32e7eSjoergwith the given contents, then the existing one is returned; equality is compared
215006f32e7eSjoergwith a FoldingSetNodeID.  The time and space complexity of add or remove
215106f32e7eSjoergoperations is logarithmic in the size of the original set.
215206f32e7eSjoerg
215306f32e7eSjoergThere is no method for returning an element of the set, you can only check for
215406f32e7eSjoergmembership.
215506f32e7eSjoerg
215606f32e7eSjoerg.. _dss_otherset:
215706f32e7eSjoerg
215806f32e7eSjoergOther Set-Like Container Options
215906f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
216006f32e7eSjoerg
216106f32e7eSjoergThe STL provides several other options, such as std::multiset and the various
216206f32e7eSjoerg"hash_set" like containers (whether from C++ TR1 or from the SGI library).  We
216306f32e7eSjoergnever use hash_set and unordered_set because they are generally very expensive
216406f32e7eSjoerg(each insertion requires a malloc) and very non-portable.
216506f32e7eSjoerg
216606f32e7eSjoergstd::multiset is useful if you're not interested in elimination of duplicates,
216706f32e7eSjoergbut has all the drawbacks of :ref:`std::set <dss_set>`.  A sorted vector
216806f32e7eSjoerg(where you don't delete duplicate entries) or some other approach is almost
216906f32e7eSjoergalways better.
217006f32e7eSjoerg
217106f32e7eSjoerg.. _ds_map:
217206f32e7eSjoerg
217306f32e7eSjoergMap-Like Containers (std::map, DenseMap, etc)
217406f32e7eSjoerg---------------------------------------------
217506f32e7eSjoerg
217606f32e7eSjoergMap-like containers are useful when you want to associate data to a key.  As
217706f32e7eSjoergusual, there are a lot of different ways to do this. :)
217806f32e7eSjoerg
217906f32e7eSjoerg.. _dss_sortedvectormap:
218006f32e7eSjoerg
218106f32e7eSjoergA sorted 'vector'
218206f32e7eSjoerg^^^^^^^^^^^^^^^^^
218306f32e7eSjoerg
218406f32e7eSjoergIf your usage pattern follows a strict insert-then-query approach, you can
218506f32e7eSjoergtrivially use the same approach as :ref:`sorted vectors for set-like containers
218606f32e7eSjoerg<dss_sortedvectorset>`.  The only difference is that your query function (which
218706f32e7eSjoerguses std::lower_bound to get efficient log(n) lookup) should only compare the
218806f32e7eSjoergkey, not both the key and value.  This yields the same advantages as sorted
218906f32e7eSjoergvectors for sets.
219006f32e7eSjoerg
219106f32e7eSjoerg.. _dss_stringmap:
219206f32e7eSjoerg
219306f32e7eSjoergllvm/ADT/StringMap.h
219406f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^
219506f32e7eSjoerg
219606f32e7eSjoergStrings are commonly used as keys in maps, and they are difficult to support
219706f32e7eSjoergefficiently: they are variable length, inefficient to hash and compare when
219806f32e7eSjoerglong, expensive to copy, etc.  StringMap is a specialized container designed to
219906f32e7eSjoergcope with these issues.  It supports mapping an arbitrary range of bytes to an
220006f32e7eSjoergarbitrary other object.
220106f32e7eSjoerg
220206f32e7eSjoergThe StringMap implementation uses a quadratically-probed hash table, where the
220306f32e7eSjoergbuckets store a pointer to the heap allocated entries (and some other stuff).
220406f32e7eSjoergThe entries in the map must be heap allocated because the strings are variable
220506f32e7eSjoerglength.  The string data (key) and the element object (value) are stored in the
220606f32e7eSjoergsame allocation with the string data immediately after the element object.
220706f32e7eSjoergThis container guarantees the "``(char*)(&Value+1)``" points to the key string
220806f32e7eSjoergfor a value.
220906f32e7eSjoerg
221006f32e7eSjoergThe StringMap is very fast for several reasons: quadratic probing is very cache
221106f32e7eSjoergefficient for lookups, the hash value of strings in buckets is not recomputed
221206f32e7eSjoergwhen looking up an element, StringMap rarely has to touch the memory for
221306f32e7eSjoergunrelated objects when looking up a value (even when hash collisions happen),
221406f32e7eSjoerghash table growth does not recompute the hash values for strings already in the
221506f32e7eSjoergtable, and each pair in the map is store in a single allocation (the string data
221606f32e7eSjoergis stored in the same allocation as the Value of a pair).
221706f32e7eSjoerg
221806f32e7eSjoergStringMap also provides query methods that take byte ranges, so it only ever
221906f32e7eSjoergcopies a string if a value is inserted into the table.
222006f32e7eSjoerg
222106f32e7eSjoergStringMap iteration order, however, is not guaranteed to be deterministic, so
222206f32e7eSjoergany uses which require that should instead use a std::map.
222306f32e7eSjoerg
222406f32e7eSjoerg.. _dss_indexmap:
222506f32e7eSjoerg
222606f32e7eSjoergllvm/ADT/IndexedMap.h
222706f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^
222806f32e7eSjoerg
222906f32e7eSjoergIndexedMap is a specialized container for mapping small dense integers (or
223006f32e7eSjoergvalues that can be mapped to small dense integers) to some other type.  It is
223106f32e7eSjoerginternally implemented as a vector with a mapping function that maps the keys
223206f32e7eSjoergto the dense integer range.
223306f32e7eSjoerg
223406f32e7eSjoergThis is useful for cases like virtual registers in the LLVM code generator: they
223506f32e7eSjoerghave a dense mapping that is offset by a compile-time constant (the first
223606f32e7eSjoergvirtual register ID).
223706f32e7eSjoerg
223806f32e7eSjoerg.. _dss_densemap:
223906f32e7eSjoerg
224006f32e7eSjoergllvm/ADT/DenseMap.h
224106f32e7eSjoerg^^^^^^^^^^^^^^^^^^^
224206f32e7eSjoerg
224306f32e7eSjoergDenseMap is a simple quadratically probed hash table.  It excels at supporting
224406f32e7eSjoergsmall keys and values: it uses a single allocation to hold all of the pairs
224506f32e7eSjoergthat are currently inserted in the map.  DenseMap is a great way to map
224606f32e7eSjoergpointers to pointers, or map other small types to each other.
224706f32e7eSjoerg
224806f32e7eSjoergThere are several aspects of DenseMap that you should be aware of, however.
224906f32e7eSjoergThe iterators in a DenseMap are invalidated whenever an insertion occurs,
225006f32e7eSjoergunlike map.  Also, because DenseMap allocates space for a large number of
225106f32e7eSjoergkey/value pairs (it starts with 64 by default), it will waste a lot of space if
225206f32e7eSjoergyour keys or values are large.  Finally, you must implement a partial
225306f32e7eSjoergspecialization of DenseMapInfo for the key that you want, if it isn't already
225406f32e7eSjoergsupported.  This is required to tell DenseMap about two special marker values
225506f32e7eSjoerg(which can never be inserted into the map) that it needs internally.
225606f32e7eSjoerg
225706f32e7eSjoergDenseMap's find_as() method supports lookup operations using an alternate key
225806f32e7eSjoergtype.  This is useful in cases where the normal key type is expensive to
225906f32e7eSjoergconstruct, but cheap to compare against.  The DenseMapInfo is responsible for
226006f32e7eSjoergdefining the appropriate comparison and hashing methods for each alternate key
226106f32e7eSjoergtype used.
226206f32e7eSjoerg
226306f32e7eSjoerg.. _dss_valuemap:
226406f32e7eSjoerg
226506f32e7eSjoergllvm/IR/ValueMap.h
226606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^
226706f32e7eSjoerg
226806f32e7eSjoergValueMap is a wrapper around a :ref:`DenseMap <dss_densemap>` mapping
226906f32e7eSjoerg``Value*``\ s (or subclasses) to another type.  When a Value is deleted or
227006f32e7eSjoergRAUW'ed, ValueMap will update itself so the new version of the key is mapped to
227106f32e7eSjoergthe same value, just as if the key were a WeakVH.  You can configure exactly how
227206f32e7eSjoergthis happens, and what else happens on these two events, by passing a ``Config``
227306f32e7eSjoergparameter to the ValueMap template.
227406f32e7eSjoerg
227506f32e7eSjoerg.. _dss_intervalmap:
227606f32e7eSjoerg
227706f32e7eSjoergllvm/ADT/IntervalMap.h
227806f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^
227906f32e7eSjoerg
228006f32e7eSjoergIntervalMap is a compact map for small keys and values.  It maps key intervals
228106f32e7eSjoerginstead of single keys, and it will automatically coalesce adjacent intervals.
228206f32e7eSjoergWhen the map only contains a few intervals, they are stored in the map object
228306f32e7eSjoergitself to avoid allocations.
228406f32e7eSjoerg
228506f32e7eSjoergThe IntervalMap iterators are quite big, so they should not be passed around as
228606f32e7eSjoergSTL iterators.  The heavyweight iterators allow a smaller data structure.
228706f32e7eSjoerg
228806f32e7eSjoerg.. _dss_map:
228906f32e7eSjoerg
229006f32e7eSjoerg<map>
229106f32e7eSjoerg^^^^^
229206f32e7eSjoerg
229306f32e7eSjoergstd::map has similar characteristics to :ref:`std::set <dss_set>`: it uses a
229406f32e7eSjoergsingle allocation per pair inserted into the map, it offers log(n) lookup with
229506f32e7eSjoergan extremely large constant factor, imposes a space penalty of 3 pointers per
229606f32e7eSjoergpair in the map, etc.
229706f32e7eSjoerg
229806f32e7eSjoergstd::map is most useful when your keys or values are very large, if you need to
229906f32e7eSjoergiterate over the collection in sorted order, or if you need stable iterators
230006f32e7eSjoerginto the map (i.e. they don't get invalidated if an insertion or deletion of
230106f32e7eSjoerganother element takes place).
230206f32e7eSjoerg
230306f32e7eSjoerg.. _dss_mapvector:
230406f32e7eSjoerg
230506f32e7eSjoergllvm/ADT/MapVector.h
230606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^
230706f32e7eSjoerg
230806f32e7eSjoerg``MapVector<KeyT,ValueT>`` provides a subset of the DenseMap interface.  The
230906f32e7eSjoergmain difference is that the iteration order is guaranteed to be the insertion
231006f32e7eSjoergorder, making it an easy (but somewhat expensive) solution for non-deterministic
231106f32e7eSjoergiteration over maps of pointers.
231206f32e7eSjoerg
231306f32e7eSjoergIt is implemented by mapping from key to an index in a vector of key,value
231406f32e7eSjoergpairs.  This provides fast lookup and iteration, but has two main drawbacks:
231506f32e7eSjoergthe key is stored twice and removing elements takes linear time.  If it is
231606f32e7eSjoergnecessary to remove elements, it's best to remove them in bulk using
231706f32e7eSjoerg``remove_if()``.
231806f32e7eSjoerg
231906f32e7eSjoerg.. _dss_inteqclasses:
232006f32e7eSjoerg
232106f32e7eSjoergllvm/ADT/IntEqClasses.h
232206f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^
232306f32e7eSjoerg
232406f32e7eSjoergIntEqClasses provides a compact representation of equivalence classes of small
232506f32e7eSjoergintegers.  Initially, each integer in the range 0..n-1 has its own equivalence
232606f32e7eSjoergclass.  Classes can be joined by passing two class representatives to the
232706f32e7eSjoergjoin(a, b) method.  Two integers are in the same class when findLeader() returns
232806f32e7eSjoergthe same representative.
232906f32e7eSjoerg
233006f32e7eSjoergOnce all equivalence classes are formed, the map can be compressed so each
233106f32e7eSjoerginteger 0..n-1 maps to an equivalence class number in the range 0..m-1, where m
233206f32e7eSjoergis the total number of equivalence classes.  The map must be uncompressed before
233306f32e7eSjoergit can be edited again.
233406f32e7eSjoerg
233506f32e7eSjoerg.. _dss_immutablemap:
233606f32e7eSjoerg
233706f32e7eSjoergllvm/ADT/ImmutableMap.h
233806f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^
233906f32e7eSjoerg
234006f32e7eSjoergImmutableMap is an immutable (functional) map implementation based on an AVL
234106f32e7eSjoergtree.  Adding or removing elements is done through a Factory object and results
234206f32e7eSjoergin the creation of a new ImmutableMap object.  If an ImmutableMap already exists
234306f32e7eSjoergwith the given key set, then the existing one is returned; equality is compared
234406f32e7eSjoergwith a FoldingSetNodeID.  The time and space complexity of add or remove
234506f32e7eSjoergoperations is logarithmic in the size of the original map.
234606f32e7eSjoerg
234706f32e7eSjoerg.. _dss_othermap:
234806f32e7eSjoerg
234906f32e7eSjoergOther Map-Like Container Options
235006f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
235106f32e7eSjoerg
235206f32e7eSjoergThe STL provides several other options, such as std::multimap and the various
235306f32e7eSjoerg"hash_map" like containers (whether from C++ TR1 or from the SGI library).  We
235406f32e7eSjoergnever use hash_set and unordered_set because they are generally very expensive
235506f32e7eSjoerg(each insertion requires a malloc) and very non-portable.
235606f32e7eSjoerg
235706f32e7eSjoergstd::multimap is useful if you want to map a key to multiple values, but has all
235806f32e7eSjoergthe drawbacks of std::map.  A sorted vector or some other approach is almost
235906f32e7eSjoergalways better.
236006f32e7eSjoerg
236106f32e7eSjoerg.. _ds_bit:
236206f32e7eSjoerg
2363*da58b97aSjoergBit storage containers (BitVector, SparseBitVector, CoalescingBitVector)
2364*da58b97aSjoerg------------------------------------------------------------------------
236506f32e7eSjoerg
2366*da58b97aSjoergThere are three bit storage containers, and choosing when to use each is
2367*da58b97aSjoergrelatively straightforward.
236806f32e7eSjoerg
236906f32e7eSjoergOne additional option is ``std::vector<bool>``: we discourage its use for two
237006f32e7eSjoergreasons 1) the implementation in many common compilers (e.g.  commonly
237106f32e7eSjoergavailable versions of GCC) is extremely inefficient and 2) the C++ standards
237206f32e7eSjoergcommittee is likely to deprecate this container and/or change it significantly
237306f32e7eSjoergsomehow.  In any case, please don't use it.
237406f32e7eSjoerg
237506f32e7eSjoerg.. _dss_bitvector:
237606f32e7eSjoerg
237706f32e7eSjoergBitVector
237806f32e7eSjoerg^^^^^^^^^
237906f32e7eSjoerg
238006f32e7eSjoergThe BitVector container provides a dynamic size set of bits for manipulation.
238106f32e7eSjoergIt supports individual bit setting/testing, as well as set operations.  The set
238206f32e7eSjoergoperations take time O(size of bitvector), but operations are performed one word
238306f32e7eSjoergat a time, instead of one bit at a time.  This makes the BitVector very fast for
238406f32e7eSjoergset operations compared to other containers.  Use the BitVector when you expect
238506f32e7eSjoergthe number of set bits to be high (i.e. a dense set).
238606f32e7eSjoerg
238706f32e7eSjoerg.. _dss_smallbitvector:
238806f32e7eSjoerg
238906f32e7eSjoergSmallBitVector
239006f32e7eSjoerg^^^^^^^^^^^^^^
239106f32e7eSjoerg
239206f32e7eSjoergThe SmallBitVector container provides the same interface as BitVector, but it is
239306f32e7eSjoergoptimized for the case where only a small number of bits, less than 25 or so,
239406f32e7eSjoergare needed.  It also transparently supports larger bit counts, but slightly less
239506f32e7eSjoergefficiently than a plain BitVector, so SmallBitVector should only be used when
239606f32e7eSjoerglarger counts are rare.
239706f32e7eSjoerg
239806f32e7eSjoergAt this time, SmallBitVector does not support set operations (and, or, xor), and
239906f32e7eSjoergits operator[] does not provide an assignable lvalue.
240006f32e7eSjoerg
240106f32e7eSjoerg.. _dss_sparsebitvector:
240206f32e7eSjoerg
240306f32e7eSjoergSparseBitVector
240406f32e7eSjoerg^^^^^^^^^^^^^^^
240506f32e7eSjoerg
240606f32e7eSjoergThe SparseBitVector container is much like BitVector, with one major difference:
240706f32e7eSjoergOnly the bits that are set, are stored.  This makes the SparseBitVector much
240806f32e7eSjoergmore space efficient than BitVector when the set is sparse, as well as making
240906f32e7eSjoergset operations O(number of set bits) instead of O(size of universe).  The
241006f32e7eSjoergdownside to the SparseBitVector is that setting and testing of random bits is
241106f32e7eSjoergO(N), and on large SparseBitVectors, this can be slower than BitVector.  In our
241206f32e7eSjoergimplementation, setting or testing bits in sorted order (either forwards or
241306f32e7eSjoergreverse) is O(1) worst case.  Testing and setting bits within 128 bits (depends
241406f32e7eSjoergon size) of the current bit is also O(1).  As a general statement,
241506f32e7eSjoergtesting/setting bits in a SparseBitVector is O(distance away from last set bit).
241606f32e7eSjoerg
2417*da58b97aSjoerg.. _dss_coalescingbitvector:
2418*da58b97aSjoerg
2419*da58b97aSjoergCoalescingBitVector
2420*da58b97aSjoerg^^^^^^^^^^^^^^^^^^^
2421*da58b97aSjoerg
2422*da58b97aSjoergThe CoalescingBitVector container is similar in principle to a SparseBitVector,
2423*da58b97aSjoergbut is optimized to represent large contiguous ranges of set bits compactly. It
2424*da58b97aSjoergdoes this by coalescing contiguous ranges of set bits into intervals. Searching
2425*da58b97aSjoergfor a bit in a CoalescingBitVector is O(log(gaps between contiguous ranges)).
2426*da58b97aSjoerg
2427*da58b97aSjoergCoalescingBitVector is a better choice than BitVector when gaps between ranges
2428*da58b97aSjoergof set bits are large. It's a better choice than SparseBitVector when find()
2429*da58b97aSjoergoperations must have fast, predictable performance. However, it's not a good
2430*da58b97aSjoergchoice for representing sets which have lots of very short ranges. E.g. the set
2431*da58b97aSjoerg`{2*x : x \in [0, n)}` would be a pathological input.
2432*da58b97aSjoerg
243306f32e7eSjoerg.. _debugging:
243406f32e7eSjoerg
243506f32e7eSjoergDebugging
243606f32e7eSjoerg=========
243706f32e7eSjoerg
243806f32e7eSjoergA handful of `GDB pretty printers
243906f32e7eSjoerg<https://sourceware.org/gdb/onlinedocs/gdb/Pretty-Printing.html>`__ are
244006f32e7eSjoergprovided for some of the core LLVM libraries. To use them, execute the
244106f32e7eSjoergfollowing (or add it to your ``~/.gdbinit``)::
244206f32e7eSjoerg
244306f32e7eSjoerg  source /path/to/llvm/src/utils/gdb-scripts/prettyprinters.py
244406f32e7eSjoerg
244506f32e7eSjoergIt also might be handy to enable the `print pretty
244606f32e7eSjoerg<http://ftp.gnu.org/old-gnu/Manuals/gdb/html_node/gdb_57.html>`__ option to
244706f32e7eSjoergavoid data structures being printed as a big block of text.
244806f32e7eSjoerg
244906f32e7eSjoerg.. _common:
245006f32e7eSjoerg
245106f32e7eSjoergHelpful Hints for Common Operations
245206f32e7eSjoerg===================================
245306f32e7eSjoerg
245406f32e7eSjoergThis section describes how to perform some very simple transformations of LLVM
245506f32e7eSjoergcode.  This is meant to give examples of common idioms used, showing the
245606f32e7eSjoergpractical side of LLVM transformations.
245706f32e7eSjoerg
245806f32e7eSjoergBecause this is a "how-to" section, you should also read about the main classes
245906f32e7eSjoergthat you will be working with.  The :ref:`Core LLVM Class Hierarchy Reference
246006f32e7eSjoerg<coreclasses>` contains details and descriptions of the main classes that you
246106f32e7eSjoergshould know about.
246206f32e7eSjoerg
246306f32e7eSjoerg.. _inspection:
246406f32e7eSjoerg
246506f32e7eSjoergBasic Inspection and Traversal Routines
246606f32e7eSjoerg---------------------------------------
246706f32e7eSjoerg
246806f32e7eSjoergThe LLVM compiler infrastructure have many different data structures that may be
246906f32e7eSjoergtraversed.  Following the example of the C++ standard template library, the
247006f32e7eSjoergtechniques used to traverse these various data structures are all basically the
2471*da58b97aSjoergsame.  For an enumerable sequence of values, the ``XXXbegin()`` function (or
247206f32e7eSjoergmethod) returns an iterator to the start of the sequence, the ``XXXend()``
247306f32e7eSjoergfunction returns an iterator pointing to one past the last valid element of the
247406f32e7eSjoergsequence, and there is some ``XXXiterator`` data type that is common between the
247506f32e7eSjoergtwo operations.
247606f32e7eSjoerg
247706f32e7eSjoergBecause the pattern for iteration is common across many different aspects of the
247806f32e7eSjoergprogram representation, the standard template library algorithms may be used on
247906f32e7eSjoergthem, and it is easier to remember how to iterate.  First we show a few common
248006f32e7eSjoergexamples of the data structures that need to be traversed.  Other data
248106f32e7eSjoergstructures are traversed in very similar ways.
248206f32e7eSjoerg
248306f32e7eSjoerg.. _iterate_function:
248406f32e7eSjoerg
248506f32e7eSjoergIterating over the ``BasicBlock`` in a ``Function``
248606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
248706f32e7eSjoerg
248806f32e7eSjoergIt's quite common to have a ``Function`` instance that you'd like to transform
248906f32e7eSjoergin some way; in particular, you'd like to manipulate its ``BasicBlock``\ s.  To
249006f32e7eSjoergfacilitate this, you'll need to iterate over all of the ``BasicBlock``\ s that
249106f32e7eSjoergconstitute the ``Function``.  The following is an example that prints the name
249206f32e7eSjoergof a ``BasicBlock`` and the number of ``Instruction``\ s it contains:
249306f32e7eSjoerg
249406f32e7eSjoerg.. code-block:: c++
249506f32e7eSjoerg
249606f32e7eSjoerg  Function &Func = ...
249706f32e7eSjoerg  for (BasicBlock &BB : Func)
249806f32e7eSjoerg    // Print out the name of the basic block if it has one, and then the
249906f32e7eSjoerg    // number of instructions that it contains
250006f32e7eSjoerg    errs() << "Basic block (name=" << BB.getName() << ") has "
250106f32e7eSjoerg               << BB.size() << " instructions.\n";
250206f32e7eSjoerg
250306f32e7eSjoerg.. _iterate_basicblock:
250406f32e7eSjoerg
250506f32e7eSjoergIterating over the ``Instruction`` in a ``BasicBlock``
250606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
250706f32e7eSjoerg
250806f32e7eSjoergJust like when dealing with ``BasicBlock``\ s in ``Function``\ s, it's easy to
250906f32e7eSjoergiterate over the individual instructions that make up ``BasicBlock``\ s.  Here's
251006f32e7eSjoerga code snippet that prints out each instruction in a ``BasicBlock``:
251106f32e7eSjoerg
251206f32e7eSjoerg.. code-block:: c++
251306f32e7eSjoerg
251406f32e7eSjoerg  BasicBlock& BB = ...
251506f32e7eSjoerg  for (Instruction &I : BB)
251606f32e7eSjoerg     // The next statement works since operator<<(ostream&,...)
251706f32e7eSjoerg     // is overloaded for Instruction&
251806f32e7eSjoerg     errs() << I << "\n";
251906f32e7eSjoerg
252006f32e7eSjoerg
252106f32e7eSjoergHowever, this isn't really the best way to print out the contents of a
252206f32e7eSjoerg``BasicBlock``!  Since the ostream operators are overloaded for virtually
252306f32e7eSjoerganything you'll care about, you could have just invoked the print routine on the
252406f32e7eSjoergbasic block itself: ``errs() << BB << "\n";``.
252506f32e7eSjoerg
252606f32e7eSjoerg.. _iterate_insiter:
252706f32e7eSjoerg
252806f32e7eSjoergIterating over the ``Instruction`` in a ``Function``
252906f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
253006f32e7eSjoerg
253106f32e7eSjoergIf you're finding that you commonly iterate over a ``Function``'s
253206f32e7eSjoerg``BasicBlock``\ s and then that ``BasicBlock``'s ``Instruction``\ s,
253306f32e7eSjoerg``InstIterator`` should be used instead.  You'll need to include
253406f32e7eSjoerg``llvm/IR/InstIterator.h`` (`doxygen
2535*da58b97aSjoerg<https://llvm.org/doxygen/InstIterator_8h.html>`__) and then instantiate
253606f32e7eSjoerg``InstIterator``\ s explicitly in your code.  Here's a small example that shows
253706f32e7eSjoerghow to dump all instructions in a function to the standard error stream:
253806f32e7eSjoerg
253906f32e7eSjoerg.. code-block:: c++
254006f32e7eSjoerg
254106f32e7eSjoerg  #include "llvm/IR/InstIterator.h"
254206f32e7eSjoerg
254306f32e7eSjoerg  // F is a pointer to a Function instance
254406f32e7eSjoerg  for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
254506f32e7eSjoerg    errs() << *I << "\n";
254606f32e7eSjoerg
254706f32e7eSjoergEasy, isn't it?  You can also use ``InstIterator``\ s to fill a work list with
254806f32e7eSjoergits initial contents.  For example, if you wanted to initialize a work list to
254906f32e7eSjoergcontain all instructions in a ``Function`` F, all you would need to do is
255006f32e7eSjoergsomething like:
255106f32e7eSjoerg
255206f32e7eSjoerg.. code-block:: c++
255306f32e7eSjoerg
255406f32e7eSjoerg  std::set<Instruction*> worklist;
255506f32e7eSjoerg  // or better yet, SmallPtrSet<Instruction*, 64> worklist;
255606f32e7eSjoerg
255706f32e7eSjoerg  for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
255806f32e7eSjoerg    worklist.insert(&*I);
255906f32e7eSjoerg
256006f32e7eSjoergThe STL set ``worklist`` would now contain all instructions in the ``Function``
256106f32e7eSjoergpointed to by F.
256206f32e7eSjoerg
256306f32e7eSjoerg.. _iterate_convert:
256406f32e7eSjoerg
256506f32e7eSjoergTurning an iterator into a class pointer (and vice-versa)
256606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
256706f32e7eSjoerg
256806f32e7eSjoergSometimes, it'll be useful to grab a reference (or pointer) to a class instance
256906f32e7eSjoergwhen all you've got at hand is an iterator.  Well, extracting a reference or a
257006f32e7eSjoergpointer from an iterator is very straight-forward.  Assuming that ``i`` is a
257106f32e7eSjoerg``BasicBlock::iterator`` and ``j`` is a ``BasicBlock::const_iterator``:
257206f32e7eSjoerg
257306f32e7eSjoerg.. code-block:: c++
257406f32e7eSjoerg
257506f32e7eSjoerg  Instruction& inst = *i;   // Grab reference to instruction reference
257606f32e7eSjoerg  Instruction* pinst = &*i; // Grab pointer to instruction reference
257706f32e7eSjoerg  const Instruction& inst = *j;
257806f32e7eSjoerg
257906f32e7eSjoergHowever, the iterators you'll be working with in the LLVM framework are special:
258006f32e7eSjoergthey will automatically convert to a ptr-to-instance type whenever they need to.
258106f32e7eSjoergInstead of dereferencing the iterator and then taking the address of the result,
258206f32e7eSjoergyou can simply assign the iterator to the proper pointer type and you get the
258306f32e7eSjoergdereference and address-of operation as a result of the assignment (behind the
258406f32e7eSjoergscenes, this is a result of overloading casting mechanisms).  Thus the second
258506f32e7eSjoergline of the last example,
258606f32e7eSjoerg
258706f32e7eSjoerg.. code-block:: c++
258806f32e7eSjoerg
258906f32e7eSjoerg  Instruction *pinst = &*i;
259006f32e7eSjoerg
259106f32e7eSjoergis semantically equivalent to
259206f32e7eSjoerg
259306f32e7eSjoerg.. code-block:: c++
259406f32e7eSjoerg
259506f32e7eSjoerg  Instruction *pinst = i;
259606f32e7eSjoerg
259706f32e7eSjoergIt's also possible to turn a class pointer into the corresponding iterator, and
259806f32e7eSjoergthis is a constant time operation (very efficient).  The following code snippet
259906f32e7eSjoergillustrates use of the conversion constructors provided by LLVM iterators.  By
260006f32e7eSjoergusing these, you can explicitly grab the iterator of something without actually
260106f32e7eSjoergobtaining it via iteration over some structure:
260206f32e7eSjoerg
260306f32e7eSjoerg.. code-block:: c++
260406f32e7eSjoerg
260506f32e7eSjoerg  void printNextInstruction(Instruction* inst) {
260606f32e7eSjoerg    BasicBlock::iterator it(inst);
260706f32e7eSjoerg    ++it; // After this line, it refers to the instruction after *inst
260806f32e7eSjoerg    if (it != inst->getParent()->end()) errs() << *it << "\n";
260906f32e7eSjoerg  }
261006f32e7eSjoerg
261106f32e7eSjoergUnfortunately, these implicit conversions come at a cost; they prevent these
261206f32e7eSjoergiterators from conforming to standard iterator conventions, and thus from being
261306f32e7eSjoergusable with standard algorithms and containers.  For example, they prevent the
261406f32e7eSjoergfollowing code, where ``B`` is a ``BasicBlock``, from compiling:
261506f32e7eSjoerg
261606f32e7eSjoerg.. code-block:: c++
261706f32e7eSjoerg
261806f32e7eSjoerg  llvm::SmallVector<llvm::Instruction *, 16>(B->begin(), B->end());
261906f32e7eSjoerg
262006f32e7eSjoergBecause of this, these implicit conversions may be removed some day, and
262106f32e7eSjoerg``operator*`` changed to return a pointer instead of a reference.
262206f32e7eSjoerg
262306f32e7eSjoerg.. _iterate_complex:
262406f32e7eSjoerg
262506f32e7eSjoergFinding call sites: a slightly more complex example
262606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
262706f32e7eSjoerg
262806f32e7eSjoergSay that you're writing a FunctionPass and would like to count all the locations
262906f32e7eSjoergin the entire module (that is, across every ``Function``) where a certain
263006f32e7eSjoergfunction (i.e., some ``Function *``) is already in scope.  As you'll learn
263106f32e7eSjoerglater, you may want to use an ``InstVisitor`` to accomplish this in a much more
263206f32e7eSjoergstraight-forward manner, but this example will allow us to explore how you'd do
263306f32e7eSjoergit if you didn't have ``InstVisitor`` around.  In pseudo-code, this is what we
263406f32e7eSjoergwant to do:
263506f32e7eSjoerg
263606f32e7eSjoerg.. code-block:: none
263706f32e7eSjoerg
263806f32e7eSjoerg  initialize callCounter to zero
263906f32e7eSjoerg  for each Function f in the Module
264006f32e7eSjoerg    for each BasicBlock b in f
264106f32e7eSjoerg      for each Instruction i in b
2642*da58b97aSjoerg        if (i a Call and calls the given function)
264306f32e7eSjoerg          increment callCounter
264406f32e7eSjoerg
264506f32e7eSjoergAnd the actual code is (remember, because we're writing a ``FunctionPass``, our
264606f32e7eSjoerg``FunctionPass``-derived class simply has to override the ``runOnFunction``
264706f32e7eSjoergmethod):
264806f32e7eSjoerg
264906f32e7eSjoerg.. code-block:: c++
265006f32e7eSjoerg
265106f32e7eSjoerg  Function* targetFunc = ...;
265206f32e7eSjoerg
265306f32e7eSjoerg  class OurFunctionPass : public FunctionPass {
265406f32e7eSjoerg    public:
265506f32e7eSjoerg      OurFunctionPass(): callCounter(0) { }
265606f32e7eSjoerg
265706f32e7eSjoerg      virtual runOnFunction(Function& F) {
265806f32e7eSjoerg        for (BasicBlock &B : F) {
265906f32e7eSjoerg          for (Instruction &I: B) {
2660*da58b97aSjoerg            if (auto *CB = dyn_cast<CallBase>(&I)) {
2661*da58b97aSjoerg              // We know we've encountered some kind of call instruction (call,
2662*da58b97aSjoerg              // invoke, or callbr), so we need to determine if it's a call to
2663*da58b97aSjoerg              // the function pointed to by m_func or not.
2664*da58b97aSjoerg              if (CB->getCalledFunction() == targetFunc)
266506f32e7eSjoerg                ++callCounter;
266606f32e7eSjoerg            }
266706f32e7eSjoerg          }
266806f32e7eSjoerg        }
266906f32e7eSjoerg      }
267006f32e7eSjoerg
267106f32e7eSjoerg    private:
267206f32e7eSjoerg      unsigned callCounter;
267306f32e7eSjoerg  };
267406f32e7eSjoerg
267506f32e7eSjoerg.. _iterate_chains:
267606f32e7eSjoerg
267706f32e7eSjoergIterating over def-use & use-def chains
267806f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
267906f32e7eSjoerg
268006f32e7eSjoergFrequently, we might have an instance of the ``Value`` class (`doxygen
2681*da58b97aSjoerg<https://llvm.org/doxygen/classllvm_1_1Value.html>`__) and we want to determine
268206f32e7eSjoergwhich ``User``\ s use the ``Value``.  The list of all ``User``\ s of a particular
268306f32e7eSjoerg``Value`` is called a *def-use* chain.  For example, let's say we have a
268406f32e7eSjoerg``Function*`` named ``F`` to a particular function ``foo``.  Finding all of the
268506f32e7eSjoerginstructions that *use* ``foo`` is as simple as iterating over the *def-use*
268606f32e7eSjoergchain of ``F``:
268706f32e7eSjoerg
268806f32e7eSjoerg.. code-block:: c++
268906f32e7eSjoerg
269006f32e7eSjoerg  Function *F = ...;
269106f32e7eSjoerg
269206f32e7eSjoerg  for (User *U : F->users()) {
269306f32e7eSjoerg    if (Instruction *Inst = dyn_cast<Instruction>(U)) {
269406f32e7eSjoerg      errs() << "F is used in instruction:\n";
269506f32e7eSjoerg      errs() << *Inst << "\n";
269606f32e7eSjoerg    }
269706f32e7eSjoerg
269806f32e7eSjoergAlternatively, it's common to have an instance of the ``User`` Class (`doxygen
2699*da58b97aSjoerg<https://llvm.org/doxygen/classllvm_1_1User.html>`__) and need to know what
270006f32e7eSjoerg``Value``\ s are used by it.  The list of all ``Value``\ s used by a ``User`` is
270106f32e7eSjoergknown as a *use-def* chain.  Instances of class ``Instruction`` are common
270206f32e7eSjoerg``User`` s, so we might want to iterate over all of the values that a particular
270306f32e7eSjoerginstruction uses (that is, the operands of the particular ``Instruction``):
270406f32e7eSjoerg
270506f32e7eSjoerg.. code-block:: c++
270606f32e7eSjoerg
270706f32e7eSjoerg  Instruction *pi = ...;
270806f32e7eSjoerg
270906f32e7eSjoerg  for (Use &U : pi->operands()) {
271006f32e7eSjoerg    Value *v = U.get();
271106f32e7eSjoerg    // ...
271206f32e7eSjoerg  }
271306f32e7eSjoerg
271406f32e7eSjoergDeclaring objects as ``const`` is an important tool of enforcing mutation free
271506f32e7eSjoergalgorithms (such as analyses, etc.).  For this purpose above iterators come in
271606f32e7eSjoergconstant flavors as ``Value::const_use_iterator`` and
271706f32e7eSjoerg``Value::const_op_iterator``.  They automatically arise when calling
271806f32e7eSjoerg``use/op_begin()`` on ``const Value*``\ s or ``const User*``\ s respectively.
271906f32e7eSjoergUpon dereferencing, they return ``const Use*``\ s.  Otherwise the above patterns
272006f32e7eSjoergremain unchanged.
272106f32e7eSjoerg
272206f32e7eSjoerg.. _iterate_preds:
272306f32e7eSjoerg
272406f32e7eSjoergIterating over predecessors & successors of blocks
272506f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
272606f32e7eSjoerg
272706f32e7eSjoergIterating over the predecessors and successors of a block is quite easy with the
272806f32e7eSjoergroutines defined in ``"llvm/IR/CFG.h"``.  Just use code like this to
272906f32e7eSjoergiterate over all predecessors of BB:
273006f32e7eSjoerg
273106f32e7eSjoerg.. code-block:: c++
273206f32e7eSjoerg
273306f32e7eSjoerg  #include "llvm/IR/CFG.h"
273406f32e7eSjoerg  BasicBlock *BB = ...;
273506f32e7eSjoerg
273606f32e7eSjoerg  for (BasicBlock *Pred : predecessors(BB)) {
273706f32e7eSjoerg    // ...
273806f32e7eSjoerg  }
273906f32e7eSjoerg
274006f32e7eSjoergSimilarly, to iterate over successors use ``successors``.
274106f32e7eSjoerg
274206f32e7eSjoerg.. _simplechanges:
274306f32e7eSjoerg
274406f32e7eSjoergMaking simple changes
274506f32e7eSjoerg---------------------
274606f32e7eSjoerg
274706f32e7eSjoergThere are some primitive transformation operations present in the LLVM
274806f32e7eSjoerginfrastructure that are worth knowing about.  When performing transformations,
274906f32e7eSjoergit's fairly common to manipulate the contents of basic blocks.  This section
275006f32e7eSjoergdescribes some of the common methods for doing so and gives example code.
275106f32e7eSjoerg
275206f32e7eSjoerg.. _schanges_creating:
275306f32e7eSjoerg
275406f32e7eSjoergCreating and inserting new ``Instruction``\ s
275506f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
275606f32e7eSjoerg
275706f32e7eSjoerg*Instantiating Instructions*
275806f32e7eSjoerg
275906f32e7eSjoergCreation of ``Instruction``\ s is straight-forward: simply call the constructor
276006f32e7eSjoergfor the kind of instruction to instantiate and provide the necessary parameters.
276106f32e7eSjoergFor example, an ``AllocaInst`` only *requires* a (const-ptr-to) ``Type``.  Thus:
276206f32e7eSjoerg
276306f32e7eSjoerg.. code-block:: c++
276406f32e7eSjoerg
276506f32e7eSjoerg  auto *ai = new AllocaInst(Type::Int32Ty);
276606f32e7eSjoerg
276706f32e7eSjoergwill create an ``AllocaInst`` instance that represents the allocation of one
276806f32e7eSjoerginteger in the current stack frame, at run time.  Each ``Instruction`` subclass
276906f32e7eSjoergis likely to have varying default parameters which change the semantics of the
277006f32e7eSjoerginstruction, so refer to the `doxygen documentation for the subclass of
2771*da58b97aSjoergInstruction <https://llvm.org/doxygen/classllvm_1_1Instruction.html>`_ that
277206f32e7eSjoergyou're interested in instantiating.
277306f32e7eSjoerg
277406f32e7eSjoerg*Naming values*
277506f32e7eSjoerg
277606f32e7eSjoergIt is very useful to name the values of instructions when you're able to, as
277706f32e7eSjoergthis facilitates the debugging of your transformations.  If you end up looking
277806f32e7eSjoergat generated LLVM machine code, you definitely want to have logical names
277906f32e7eSjoergassociated with the results of instructions!  By supplying a value for the
278006f32e7eSjoerg``Name`` (default) parameter of the ``Instruction`` constructor, you associate a
278106f32e7eSjoerglogical name with the result of the instruction's execution at run time.  For
278206f32e7eSjoergexample, say that I'm writing a transformation that dynamically allocates space
278306f32e7eSjoergfor an integer on the stack, and that integer is going to be used as some kind
278406f32e7eSjoergof index by some other code.  To accomplish this, I place an ``AllocaInst`` at
278506f32e7eSjoergthe first point in the first ``BasicBlock`` of some ``Function``, and I'm
278606f32e7eSjoergintending to use it within the same ``Function``.  I might do:
278706f32e7eSjoerg
278806f32e7eSjoerg.. code-block:: c++
278906f32e7eSjoerg
279006f32e7eSjoerg  auto *pa = new AllocaInst(Type::Int32Ty, 0, "indexLoc");
279106f32e7eSjoerg
279206f32e7eSjoergwhere ``indexLoc`` is now the logical name of the instruction's execution value,
279306f32e7eSjoergwhich is a pointer to an integer on the run time stack.
279406f32e7eSjoerg
279506f32e7eSjoerg*Inserting instructions*
279606f32e7eSjoerg
279706f32e7eSjoergThere are essentially three ways to insert an ``Instruction`` into an existing
279806f32e7eSjoergsequence of instructions that form a ``BasicBlock``:
279906f32e7eSjoerg
280006f32e7eSjoerg* Insertion into an explicit instruction list
280106f32e7eSjoerg
280206f32e7eSjoerg  Given a ``BasicBlock* pb``, an ``Instruction* pi`` within that ``BasicBlock``,
280306f32e7eSjoerg  and a newly-created instruction we wish to insert before ``*pi``, we do the
280406f32e7eSjoerg  following:
280506f32e7eSjoerg
280606f32e7eSjoerg  .. code-block:: c++
280706f32e7eSjoerg
280806f32e7eSjoerg      BasicBlock *pb = ...;
280906f32e7eSjoerg      Instruction *pi = ...;
281006f32e7eSjoerg      auto *newInst = new Instruction(...);
281106f32e7eSjoerg
281206f32e7eSjoerg      pb->getInstList().insert(pi, newInst); // Inserts newInst before pi in pb
281306f32e7eSjoerg
281406f32e7eSjoerg  Appending to the end of a ``BasicBlock`` is so common that the ``Instruction``
281506f32e7eSjoerg  class and ``Instruction``-derived classes provide constructors which take a
281606f32e7eSjoerg  pointer to a ``BasicBlock`` to be appended to.  For example code that looked
281706f32e7eSjoerg  like:
281806f32e7eSjoerg
281906f32e7eSjoerg  .. code-block:: c++
282006f32e7eSjoerg
282106f32e7eSjoerg    BasicBlock *pb = ...;
282206f32e7eSjoerg    auto *newInst = new Instruction(...);
282306f32e7eSjoerg
282406f32e7eSjoerg    pb->getInstList().push_back(newInst); // Appends newInst to pb
282506f32e7eSjoerg
282606f32e7eSjoerg  becomes:
282706f32e7eSjoerg
282806f32e7eSjoerg  .. code-block:: c++
282906f32e7eSjoerg
283006f32e7eSjoerg    BasicBlock *pb = ...;
283106f32e7eSjoerg    auto *newInst = new Instruction(..., pb);
283206f32e7eSjoerg
283306f32e7eSjoerg  which is much cleaner, especially if you are creating long instruction
283406f32e7eSjoerg  streams.
283506f32e7eSjoerg
283606f32e7eSjoerg* Insertion into an implicit instruction list
283706f32e7eSjoerg
283806f32e7eSjoerg  ``Instruction`` instances that are already in ``BasicBlock``\ s are implicitly
283906f32e7eSjoerg  associated with an existing instruction list: the instruction list of the
284006f32e7eSjoerg  enclosing basic block.  Thus, we could have accomplished the same thing as the
284106f32e7eSjoerg  above code without being given a ``BasicBlock`` by doing:
284206f32e7eSjoerg
284306f32e7eSjoerg  .. code-block:: c++
284406f32e7eSjoerg
284506f32e7eSjoerg    Instruction *pi = ...;
284606f32e7eSjoerg    auto *newInst = new Instruction(...);
284706f32e7eSjoerg
284806f32e7eSjoerg    pi->getParent()->getInstList().insert(pi, newInst);
284906f32e7eSjoerg
285006f32e7eSjoerg  In fact, this sequence of steps occurs so frequently that the ``Instruction``
285106f32e7eSjoerg  class and ``Instruction``-derived classes provide constructors which take (as
285206f32e7eSjoerg  a default parameter) a pointer to an ``Instruction`` which the newly-created
285306f32e7eSjoerg  ``Instruction`` should precede.  That is, ``Instruction`` constructors are
285406f32e7eSjoerg  capable of inserting the newly-created instance into the ``BasicBlock`` of a
285506f32e7eSjoerg  provided instruction, immediately before that instruction.  Using an
285606f32e7eSjoerg  ``Instruction`` constructor with a ``insertBefore`` (default) parameter, the
285706f32e7eSjoerg  above code becomes:
285806f32e7eSjoerg
285906f32e7eSjoerg  .. code-block:: c++
286006f32e7eSjoerg
286106f32e7eSjoerg    Instruction* pi = ...;
286206f32e7eSjoerg    auto *newInst = new Instruction(..., pi);
286306f32e7eSjoerg
286406f32e7eSjoerg  which is much cleaner, especially if you're creating a lot of instructions and
286506f32e7eSjoerg  adding them to ``BasicBlock``\ s.
286606f32e7eSjoerg
286706f32e7eSjoerg* Insertion using an instance of ``IRBuilder``
286806f32e7eSjoerg
286906f32e7eSjoerg  Inserting several ``Instruction``\ s can be quite laborious using the previous
287006f32e7eSjoerg  methods. The ``IRBuilder`` is a convenience class that can be used to add
287106f32e7eSjoerg  several instructions to the end of a ``BasicBlock`` or before a particular
287206f32e7eSjoerg  ``Instruction``. It also supports constant folding and renaming named
287306f32e7eSjoerg  registers (see ``IRBuilder``'s template arguments).
287406f32e7eSjoerg
287506f32e7eSjoerg  The example below demonstrates a very simple use of the ``IRBuilder`` where
287606f32e7eSjoerg  three instructions are inserted before the instruction ``pi``. The first two
287706f32e7eSjoerg  instructions are Call instructions and third instruction multiplies the return
287806f32e7eSjoerg  value of the two calls.
287906f32e7eSjoerg
288006f32e7eSjoerg  .. code-block:: c++
288106f32e7eSjoerg
288206f32e7eSjoerg    Instruction *pi = ...;
288306f32e7eSjoerg    IRBuilder<> Builder(pi);
288406f32e7eSjoerg    CallInst* callOne = Builder.CreateCall(...);
288506f32e7eSjoerg    CallInst* callTwo = Builder.CreateCall(...);
288606f32e7eSjoerg    Value* result = Builder.CreateMul(callOne, callTwo);
288706f32e7eSjoerg
288806f32e7eSjoerg  The example below is similar to the above example except that the created
288906f32e7eSjoerg  ``IRBuilder`` inserts instructions at the end of the ``BasicBlock`` ``pb``.
289006f32e7eSjoerg
289106f32e7eSjoerg  .. code-block:: c++
289206f32e7eSjoerg
289306f32e7eSjoerg    BasicBlock *pb = ...;
289406f32e7eSjoerg    IRBuilder<> Builder(pb);
289506f32e7eSjoerg    CallInst* callOne = Builder.CreateCall(...);
289606f32e7eSjoerg    CallInst* callTwo = Builder.CreateCall(...);
289706f32e7eSjoerg    Value* result = Builder.CreateMul(callOne, callTwo);
289806f32e7eSjoerg
289906f32e7eSjoerg  See :doc:`tutorial/LangImpl03` for a practical use of the ``IRBuilder``.
290006f32e7eSjoerg
290106f32e7eSjoerg
290206f32e7eSjoerg.. _schanges_deleting:
290306f32e7eSjoerg
290406f32e7eSjoergDeleting Instructions
290506f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^
290606f32e7eSjoerg
290706f32e7eSjoergDeleting an instruction from an existing sequence of instructions that form a
290806f32e7eSjoergBasicBlock_ is very straight-forward: just call the instruction's
290906f32e7eSjoerg``eraseFromParent()`` method.  For example:
291006f32e7eSjoerg
291106f32e7eSjoerg.. code-block:: c++
291206f32e7eSjoerg
291306f32e7eSjoerg  Instruction *I = .. ;
291406f32e7eSjoerg  I->eraseFromParent();
291506f32e7eSjoerg
291606f32e7eSjoergThis unlinks the instruction from its containing basic block and deletes it.  If
291706f32e7eSjoergyou'd just like to unlink the instruction from its containing basic block but
291806f32e7eSjoergnot delete it, you can use the ``removeFromParent()`` method.
291906f32e7eSjoerg
292006f32e7eSjoerg.. _schanges_replacing:
292106f32e7eSjoerg
292206f32e7eSjoergReplacing an Instruction with another Value
292306f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
292406f32e7eSjoerg
292506f32e7eSjoergReplacing individual instructions
292606f32e7eSjoerg"""""""""""""""""""""""""""""""""
292706f32e7eSjoerg
292806f32e7eSjoergIncluding "`llvm/Transforms/Utils/BasicBlockUtils.h
2929*da58b97aSjoerg<https://llvm.org/doxygen/BasicBlockUtils_8h_source.html>`_" permits use of two
293006f32e7eSjoergvery useful replace functions: ``ReplaceInstWithValue`` and
293106f32e7eSjoerg``ReplaceInstWithInst``.
293206f32e7eSjoerg
293306f32e7eSjoerg.. _schanges_deleting_sub:
293406f32e7eSjoerg
293506f32e7eSjoergDeleting Instructions
293606f32e7eSjoerg"""""""""""""""""""""
293706f32e7eSjoerg
293806f32e7eSjoerg* ``ReplaceInstWithValue``
293906f32e7eSjoerg
294006f32e7eSjoerg  This function replaces all uses of a given instruction with a value, and then
294106f32e7eSjoerg  removes the original instruction.  The following example illustrates the
294206f32e7eSjoerg  replacement of the result of a particular ``AllocaInst`` that allocates memory
294306f32e7eSjoerg  for a single integer with a null pointer to an integer.
294406f32e7eSjoerg
294506f32e7eSjoerg  .. code-block:: c++
294606f32e7eSjoerg
294706f32e7eSjoerg    AllocaInst* instToReplace = ...;
294806f32e7eSjoerg    BasicBlock::iterator ii(instToReplace);
294906f32e7eSjoerg
295006f32e7eSjoerg    ReplaceInstWithValue(instToReplace->getParent()->getInstList(), ii,
295106f32e7eSjoerg                         Constant::getNullValue(PointerType::getUnqual(Type::Int32Ty)));
295206f32e7eSjoerg
295306f32e7eSjoerg* ``ReplaceInstWithInst``
295406f32e7eSjoerg
295506f32e7eSjoerg  This function replaces a particular instruction with another instruction,
295606f32e7eSjoerg  inserting the new instruction into the basic block at the location where the
295706f32e7eSjoerg  old instruction was, and replacing any uses of the old instruction with the
295806f32e7eSjoerg  new instruction.  The following example illustrates the replacement of one
295906f32e7eSjoerg  ``AllocaInst`` with another.
296006f32e7eSjoerg
296106f32e7eSjoerg  .. code-block:: c++
296206f32e7eSjoerg
296306f32e7eSjoerg    AllocaInst* instToReplace = ...;
296406f32e7eSjoerg    BasicBlock::iterator ii(instToReplace);
296506f32e7eSjoerg
296606f32e7eSjoerg    ReplaceInstWithInst(instToReplace->getParent()->getInstList(), ii,
296706f32e7eSjoerg                        new AllocaInst(Type::Int32Ty, 0, "ptrToReplacedInt"));
296806f32e7eSjoerg
296906f32e7eSjoerg
297006f32e7eSjoergReplacing multiple uses of Users and Values
297106f32e7eSjoerg"""""""""""""""""""""""""""""""""""""""""""
297206f32e7eSjoerg
297306f32e7eSjoergYou can use ``Value::replaceAllUsesWith`` and ``User::replaceUsesOfWith`` to
297406f32e7eSjoergchange more than one use at a time.  See the doxygen documentation for the
2975*da58b97aSjoerg`Value Class <https://llvm.org/doxygen/classllvm_1_1Value.html>`_ and `User Class
2976*da58b97aSjoerg<https://llvm.org/doxygen/classllvm_1_1User.html>`_, respectively, for more
297706f32e7eSjoerginformation.
297806f32e7eSjoerg
297906f32e7eSjoerg.. _schanges_deletingGV:
298006f32e7eSjoerg
298106f32e7eSjoergDeleting GlobalVariables
298206f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^
298306f32e7eSjoerg
298406f32e7eSjoergDeleting a global variable from a module is just as easy as deleting an
298506f32e7eSjoergInstruction.  First, you must have a pointer to the global variable that you
298606f32e7eSjoergwish to delete.  You use this pointer to erase it from its parent, the module.
298706f32e7eSjoergFor example:
298806f32e7eSjoerg
298906f32e7eSjoerg.. code-block:: c++
299006f32e7eSjoerg
299106f32e7eSjoerg  GlobalVariable *GV = .. ;
299206f32e7eSjoerg
299306f32e7eSjoerg  GV->eraseFromParent();
299406f32e7eSjoerg
299506f32e7eSjoerg
299606f32e7eSjoerg.. _threading:
299706f32e7eSjoerg
299806f32e7eSjoergThreads and LLVM
299906f32e7eSjoerg================
300006f32e7eSjoerg
300106f32e7eSjoergThis section describes the interaction of the LLVM APIs with multithreading,
300206f32e7eSjoergboth on the part of client applications, and in the JIT, in the hosted
300306f32e7eSjoergapplication.
300406f32e7eSjoerg
300506f32e7eSjoergNote that LLVM's support for multithreading is still relatively young.  Up
300606f32e7eSjoergthrough version 2.5, the execution of threaded hosted applications was
300706f32e7eSjoergsupported, but not threaded client access to the APIs.  While this use case is
300806f32e7eSjoergnow supported, clients *must* adhere to the guidelines specified below to ensure
300906f32e7eSjoergproper operation in multithreaded mode.
301006f32e7eSjoerg
301106f32e7eSjoergNote that, on Unix-like platforms, LLVM requires the presence of GCC's atomic
301206f32e7eSjoergintrinsics in order to support threaded operation.  If you need a
3013*da58b97aSjoergmultithreading-capable LLVM on a platform without a suitably modern system
301406f32e7eSjoergcompiler, consider compiling LLVM and LLVM-GCC in single-threaded mode, and
301506f32e7eSjoergusing the resultant compiler to build a copy of LLVM with multithreading
301606f32e7eSjoergsupport.
301706f32e7eSjoerg
301806f32e7eSjoerg.. _shutdown:
301906f32e7eSjoerg
302006f32e7eSjoergEnding Execution with ``llvm_shutdown()``
302106f32e7eSjoerg-----------------------------------------
302206f32e7eSjoerg
302306f32e7eSjoergWhen you are done using the LLVM APIs, you should call ``llvm_shutdown()`` to
302406f32e7eSjoergdeallocate memory used for internal structures.
302506f32e7eSjoerg
302606f32e7eSjoerg.. _managedstatic:
302706f32e7eSjoerg
302806f32e7eSjoergLazy Initialization with ``ManagedStatic``
302906f32e7eSjoerg------------------------------------------
303006f32e7eSjoerg
303106f32e7eSjoerg``ManagedStatic`` is a utility class in LLVM used to implement static
303206f32e7eSjoerginitialization of static resources, such as the global type tables.  In a
303306f32e7eSjoergsingle-threaded environment, it implements a simple lazy initialization scheme.
303406f32e7eSjoergWhen LLVM is compiled with support for multi-threading, however, it uses
303506f32e7eSjoergdouble-checked locking to implement thread-safe lazy initialization.
303606f32e7eSjoerg
303706f32e7eSjoerg.. _llvmcontext:
303806f32e7eSjoerg
303906f32e7eSjoergAchieving Isolation with ``LLVMContext``
304006f32e7eSjoerg----------------------------------------
304106f32e7eSjoerg
304206f32e7eSjoerg``LLVMContext`` is an opaque class in the LLVM API which clients can use to
304306f32e7eSjoergoperate multiple, isolated instances of LLVM concurrently within the same
304406f32e7eSjoergaddress space.  For instance, in a hypothetical compile-server, the compilation
304506f32e7eSjoergof an individual translation unit is conceptually independent from all the
304606f32e7eSjoergothers, and it would be desirable to be able to compile incoming translation
304706f32e7eSjoergunits concurrently on independent server threads.  Fortunately, ``LLVMContext``
304806f32e7eSjoergexists to enable just this kind of scenario!
304906f32e7eSjoerg
305006f32e7eSjoergConceptually, ``LLVMContext`` provides isolation.  Every LLVM entity
305106f32e7eSjoerg(``Module``\ s, ``Value``\ s, ``Type``\ s, ``Constant``\ s, etc.) in LLVM's
305206f32e7eSjoergin-memory IR belongs to an ``LLVMContext``.  Entities in different contexts
305306f32e7eSjoerg*cannot* interact with each other: ``Module``\ s in different contexts cannot be
305406f32e7eSjoerglinked together, ``Function``\ s cannot be added to ``Module``\ s in different
305506f32e7eSjoergcontexts, etc.  What this means is that is safe to compile on multiple
305606f32e7eSjoergthreads simultaneously, as long as no two threads operate on entities within the
305706f32e7eSjoergsame context.
305806f32e7eSjoerg
305906f32e7eSjoergIn practice, very few places in the API require the explicit specification of a
306006f32e7eSjoerg``LLVMContext``, other than the ``Type`` creation/lookup APIs.  Because every
306106f32e7eSjoerg``Type`` carries a reference to its owning context, most other entities can
306206f32e7eSjoergdetermine what context they belong to by looking at their own ``Type``.  If you
306306f32e7eSjoergare adding new entities to LLVM IR, please try to maintain this interface
306406f32e7eSjoergdesign.
306506f32e7eSjoerg
306606f32e7eSjoerg.. _jitthreading:
306706f32e7eSjoerg
306806f32e7eSjoergThreads and the JIT
306906f32e7eSjoerg-------------------
307006f32e7eSjoerg
307106f32e7eSjoergLLVM's "eager" JIT compiler is safe to use in threaded programs.  Multiple
307206f32e7eSjoergthreads can call ``ExecutionEngine::getPointerToFunction()`` or
307306f32e7eSjoerg``ExecutionEngine::runFunction()`` concurrently, and multiple threads can run
307406f32e7eSjoergcode output by the JIT concurrently.  The user must still ensure that only one
307506f32e7eSjoergthread accesses IR in a given ``LLVMContext`` while another thread might be
307606f32e7eSjoergmodifying it.  One way to do that is to always hold the JIT lock while accessing
307706f32e7eSjoergIR outside the JIT (the JIT *modifies* the IR by adding ``CallbackVH``\ s).
307806f32e7eSjoergAnother way is to only call ``getPointerToFunction()`` from the
307906f32e7eSjoerg``LLVMContext``'s thread.
308006f32e7eSjoerg
308106f32e7eSjoergWhen the JIT is configured to compile lazily (using
308206f32e7eSjoerg``ExecutionEngine::DisableLazyCompilation(false)``), there is currently a `race
308306f32e7eSjoergcondition <https://bugs.llvm.org/show_bug.cgi?id=5184>`_ in updating call sites
308406f32e7eSjoergafter a function is lazily-jitted.  It's still possible to use the lazy JIT in a
308506f32e7eSjoergthreaded program if you ensure that only one thread at a time can call any
308606f32e7eSjoergparticular lazy stub and that the JIT lock guards any IR access, but we suggest
308706f32e7eSjoergusing only the eager JIT in threaded programs.
308806f32e7eSjoerg
308906f32e7eSjoerg.. _advanced:
309006f32e7eSjoerg
309106f32e7eSjoergAdvanced Topics
309206f32e7eSjoerg===============
309306f32e7eSjoerg
309406f32e7eSjoergThis section describes some of the advanced or obscure API's that most clients
309506f32e7eSjoergdo not need to be aware of.  These API's tend manage the inner workings of the
309606f32e7eSjoergLLVM system, and only need to be accessed in unusual circumstances.
309706f32e7eSjoerg
309806f32e7eSjoerg.. _SymbolTable:
309906f32e7eSjoerg
310006f32e7eSjoergThe ``ValueSymbolTable`` class
310106f32e7eSjoerg------------------------------
310206f32e7eSjoerg
310306f32e7eSjoergThe ``ValueSymbolTable`` (`doxygen
3104*da58b97aSjoerg<https://llvm.org/doxygen/classllvm_1_1ValueSymbolTable.html>`__) class provides
310506f32e7eSjoerga symbol table that the :ref:`Function <c_Function>` and Module_ classes use for
310606f32e7eSjoergnaming value definitions.  The symbol table can provide a name for any Value_.
310706f32e7eSjoerg
310806f32e7eSjoergNote that the ``SymbolTable`` class should not be directly accessed by most
310906f32e7eSjoergclients.  It should only be used when iteration over the symbol table names
311006f32e7eSjoergthemselves are required, which is very special purpose.  Note that not all LLVM
311106f32e7eSjoergValue_\ s have names, and those without names (i.e. they have an empty name) do
311206f32e7eSjoergnot exist in the symbol table.
311306f32e7eSjoerg
311406f32e7eSjoergSymbol tables support iteration over the values in the symbol table with
311506f32e7eSjoerg``begin/end/iterator`` and supports querying to see if a specific name is in the
311606f32e7eSjoergsymbol table (with ``lookup``).  The ``ValueSymbolTable`` class exposes no
311706f32e7eSjoergpublic mutator methods, instead, simply call ``setName`` on a value, which will
311806f32e7eSjoergautoinsert it into the appropriate symbol table.
311906f32e7eSjoerg
312006f32e7eSjoerg.. _UserLayout:
312106f32e7eSjoerg
312206f32e7eSjoergThe ``User`` and owned ``Use`` classes' memory layout
312306f32e7eSjoerg-----------------------------------------------------
312406f32e7eSjoerg
3125*da58b97aSjoergThe ``User`` (`doxygen <https://llvm.org/doxygen/classllvm_1_1User.html>`__)
312606f32e7eSjoergclass provides a basis for expressing the ownership of ``User`` towards other
3127*da58b97aSjoerg`Value instance <https://llvm.org/doxygen/classllvm_1_1Value.html>`_\ s.  The
3128*da58b97aSjoerg``Use`` (`doxygen <https://llvm.org/doxygen/classllvm_1_1Use.html>`__) helper
312906f32e7eSjoergclass is employed to do the bookkeeping and to facilitate *O(1)* addition and
313006f32e7eSjoergremoval.
313106f32e7eSjoerg
313206f32e7eSjoerg.. _Use2User:
313306f32e7eSjoerg
313406f32e7eSjoergInteraction and relationship between ``User`` and ``Use`` objects
313506f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
313606f32e7eSjoerg
313706f32e7eSjoergA subclass of ``User`` can choose between incorporating its ``Use`` objects or
313806f32e7eSjoergrefer to them out-of-line by means of a pointer.  A mixed variant (some ``Use``
313906f32e7eSjoergs inline others hung off) is impractical and breaks the invariant that the
314006f32e7eSjoerg``Use`` objects belonging to the same ``User`` form a contiguous array.
314106f32e7eSjoerg
314206f32e7eSjoergWe have 2 different layouts in the ``User`` (sub)classes:
314306f32e7eSjoerg
314406f32e7eSjoerg* Layout a)
314506f32e7eSjoerg
314606f32e7eSjoerg  The ``Use`` object(s) are inside (resp. at fixed offset) of the ``User``
314706f32e7eSjoerg  object and there are a fixed number of them.
314806f32e7eSjoerg
314906f32e7eSjoerg* Layout b)
315006f32e7eSjoerg
315106f32e7eSjoerg  The ``Use`` object(s) are referenced by a pointer to an array from the
315206f32e7eSjoerg  ``User`` object and there may be a variable number of them.
315306f32e7eSjoerg
315406f32e7eSjoergAs of v2.4 each layout still possesses a direct pointer to the start of the
315506f32e7eSjoergarray of ``Use``\ s.  Though not mandatory for layout a), we stick to this
315606f32e7eSjoergredundancy for the sake of simplicity.  The ``User`` object also stores the
315706f32e7eSjoergnumber of ``Use`` objects it has. (Theoretically this information can also be
315806f32e7eSjoergcalculated given the scheme presented below.)
315906f32e7eSjoerg
316006f32e7eSjoergSpecial forms of allocation operators (``operator new``) enforce the following
316106f32e7eSjoergmemory layouts:
316206f32e7eSjoerg
316306f32e7eSjoerg* Layout a) is modelled by prepending the ``User`` object by the ``Use[]``
316406f32e7eSjoerg  array.
316506f32e7eSjoerg
316606f32e7eSjoerg  .. code-block:: none
316706f32e7eSjoerg
316806f32e7eSjoerg    ...---.---.---.---.-------...
316906f32e7eSjoerg      | P | P | P | P | User
317006f32e7eSjoerg    '''---'---'---'---'-------'''
317106f32e7eSjoerg
317206f32e7eSjoerg* Layout b) is modelled by pointing at the ``Use[]`` array.
317306f32e7eSjoerg
317406f32e7eSjoerg  .. code-block:: none
317506f32e7eSjoerg
317606f32e7eSjoerg    .-------...
317706f32e7eSjoerg    | User
317806f32e7eSjoerg    '-------'''
317906f32e7eSjoerg        |
318006f32e7eSjoerg        v
318106f32e7eSjoerg        .---.---.---.---...
318206f32e7eSjoerg        | P | P | P | P |
318306f32e7eSjoerg        '---'---'---'---'''
318406f32e7eSjoerg
318506f32e7eSjoerg*(In the above figures* '``P``' *stands for the* ``Use**`` *that is stored in
318606f32e7eSjoergeach* ``Use`` *object in the member* ``Use::Prev`` *)*
318706f32e7eSjoerg
318806f32e7eSjoerg.. _polymorphism:
318906f32e7eSjoerg
3190*da58b97aSjoergDesigning Type Hierarchies and Polymorphic Interfaces
319106f32e7eSjoerg-----------------------------------------------------
319206f32e7eSjoerg
319306f32e7eSjoergThere are two different design patterns that tend to result in the use of
319406f32e7eSjoergvirtual dispatch for methods in a type hierarchy in C++ programs. The first is
319506f32e7eSjoerga genuine type hierarchy where different types in the hierarchy model
319606f32e7eSjoerga specific subset of the functionality and semantics, and these types nest
319706f32e7eSjoergstrictly within each other. Good examples of this can be seen in the ``Value``
319806f32e7eSjoergor ``Type`` type hierarchies.
319906f32e7eSjoerg
320006f32e7eSjoergA second is the desire to dispatch dynamically across a collection of
320106f32e7eSjoergpolymorphic interface implementations. This latter use case can be modeled with
320206f32e7eSjoergvirtual dispatch and inheritance by defining an abstract interface base class
320306f32e7eSjoergwhich all implementations derive from and override. However, this
320406f32e7eSjoergimplementation strategy forces an **"is-a"** relationship to exist that is not
320506f32e7eSjoergactually meaningful. There is often not some nested hierarchy of useful
320606f32e7eSjoerggeneralizations which code might interact with and move up and down. Instead,
320706f32e7eSjoergthere is a singular interface which is dispatched across a range of
320806f32e7eSjoergimplementations.
320906f32e7eSjoerg
321006f32e7eSjoergThe preferred implementation strategy for the second use case is that of
321106f32e7eSjoerggeneric programming (sometimes called "compile-time duck typing" or "static
321206f32e7eSjoergpolymorphism"). For example, a template over some type parameter ``T`` can be
321306f32e7eSjoerginstantiated across any particular implementation that conforms to the
321406f32e7eSjoerginterface or *concept*. A good example here is the highly generic properties of
321506f32e7eSjoergany type which models a node in a directed graph. LLVM models these primarily
321606f32e7eSjoergthrough templates and generic programming. Such templates include the
321706f32e7eSjoerg``LoopInfoBase`` and ``DominatorTreeBase``. When this type of polymorphism
321806f32e7eSjoergtruly needs **dynamic** dispatch you can generalize it using a technique
321906f32e7eSjoergcalled *concept-based polymorphism*. This pattern emulates the interfaces and
322006f32e7eSjoergbehaviors of templates using a very limited form of virtual dispatch for type
322106f32e7eSjoergerasure inside its implementation. You can find examples of this technique in
322206f32e7eSjoergthe ``PassManager.h`` system, and there is a more detailed introduction to it
322306f32e7eSjoergby Sean Parent in several of his talks and papers:
322406f32e7eSjoerg
322506f32e7eSjoerg#. `Inheritance Is The Base Class of Evil
322606f32e7eSjoerg   <http://channel9.msdn.com/Events/GoingNative/2013/Inheritance-Is-The-Base-Class-of-Evil>`_
322706f32e7eSjoerg   - The GoingNative 2013 talk describing this technique, and probably the best
322806f32e7eSjoerg   place to start.
322906f32e7eSjoerg#. `Value Semantics and Concepts-based Polymorphism
323006f32e7eSjoerg   <http://www.youtube.com/watch?v=_BpMYeUFXv8>`_ - The C++Now! 2012 talk
323106f32e7eSjoerg   describing this technique in more detail.
323206f32e7eSjoerg#. `Sean Parent's Papers and Presentations
323306f32e7eSjoerg   <http://github.com/sean-parent/sean-parent.github.com/wiki/Papers-and-Presentations>`_
3234*da58b97aSjoerg   - A GitHub project full of links to slides, video, and sometimes code.
323506f32e7eSjoerg
323606f32e7eSjoergWhen deciding between creating a type hierarchy (with either tagged or virtual
323706f32e7eSjoergdispatch) and using templates or concepts-based polymorphism, consider whether
323806f32e7eSjoergthere is some refinement of an abstract base class which is a semantically
323906f32e7eSjoergmeaningful type on an interface boundary. If anything more refined than the
324006f32e7eSjoergroot abstract interface is meaningless to talk about as a partial extension of
324106f32e7eSjoergthe semantic model, then your use case likely fits better with polymorphism and
324206f32e7eSjoergyou should avoid using virtual dispatch. However, there may be some exigent
324306f32e7eSjoergcircumstances that require one technique or the other to be used.
324406f32e7eSjoerg
324506f32e7eSjoergIf you do need to introduce a type hierarchy, we prefer to use explicitly
324606f32e7eSjoergclosed type hierarchies with manual tagged dispatch and/or RTTI rather than the
324706f32e7eSjoergopen inheritance model and virtual dispatch that is more common in C++ code.
324806f32e7eSjoergThis is because LLVM rarely encourages library consumers to extend its core
324906f32e7eSjoergtypes, and leverages the closed and tag-dispatched nature of its hierarchies to
325006f32e7eSjoerggenerate significantly more efficient code. We have also found that a large
325106f32e7eSjoergamount of our usage of type hierarchies fits better with tag-based pattern
325206f32e7eSjoergmatching rather than dynamic dispatch across a common interface. Within LLVM we
325306f32e7eSjoerghave built custom helpers to facilitate this design. See this document's
325406f32e7eSjoergsection on :ref:`isa and dyn_cast <isa>` and our :doc:`detailed document
325506f32e7eSjoerg<HowToSetUpLLVMStyleRTTI>` which describes how you can implement this
325606f32e7eSjoergpattern for use with the LLVM helpers.
325706f32e7eSjoerg
325806f32e7eSjoerg.. _abi_breaking_checks:
325906f32e7eSjoerg
326006f32e7eSjoergABI Breaking Checks
326106f32e7eSjoerg-------------------
326206f32e7eSjoerg
326306f32e7eSjoergChecks and asserts that alter the LLVM C++ ABI are predicated on the
326406f32e7eSjoergpreprocessor symbol `LLVM_ENABLE_ABI_BREAKING_CHECKS` -- LLVM
326506f32e7eSjoerglibraries built with `LLVM_ENABLE_ABI_BREAKING_CHECKS` are not ABI
326606f32e7eSjoergcompatible LLVM libraries built without it defined.  By default,
326706f32e7eSjoergturning on assertions also turns on `LLVM_ENABLE_ABI_BREAKING_CHECKS`
326806f32e7eSjoergso a default +Asserts build is not ABI compatible with a
326906f32e7eSjoergdefault -Asserts build.  Clients that want ABI compatibility
327006f32e7eSjoergbetween +Asserts and -Asserts builds should use the CMake build system
327106f32e7eSjoergto set `LLVM_ENABLE_ABI_BREAKING_CHECKS` independently
327206f32e7eSjoergof `LLVM_ENABLE_ASSERTIONS`.
327306f32e7eSjoerg
327406f32e7eSjoerg.. _coreclasses:
327506f32e7eSjoerg
327606f32e7eSjoergThe Core LLVM Class Hierarchy Reference
327706f32e7eSjoerg=======================================
327806f32e7eSjoerg
327906f32e7eSjoerg``#include "llvm/IR/Type.h"``
328006f32e7eSjoerg
3281*da58b97aSjoergheader source: `Type.h <https://llvm.org/doxygen/Type_8h_source.html>`_
328206f32e7eSjoerg
3283*da58b97aSjoergdoxygen info: `Type Classes <https://llvm.org/doxygen/classllvm_1_1Type.html>`_
328406f32e7eSjoerg
328506f32e7eSjoergThe Core LLVM classes are the primary means of representing the program being
328606f32e7eSjoerginspected or transformed.  The core LLVM classes are defined in header files in
328706f32e7eSjoergthe ``include/llvm/IR`` directory, and implemented in the ``lib/IR``
328806f32e7eSjoergdirectory. It's worth noting that, for historical reasons, this library is
328906f32e7eSjoergcalled ``libLLVMCore.so``, not ``libLLVMIR.so`` as you might expect.
329006f32e7eSjoerg
329106f32e7eSjoerg.. _Type:
329206f32e7eSjoerg
329306f32e7eSjoergThe Type class and Derived Types
329406f32e7eSjoerg--------------------------------
329506f32e7eSjoerg
329606f32e7eSjoerg``Type`` is a superclass of all type classes.  Every ``Value`` has a ``Type``.
329706f32e7eSjoerg``Type`` cannot be instantiated directly but only through its subclasses.
329806f32e7eSjoergCertain primitive types (``VoidType``, ``LabelType``, ``FloatType`` and
329906f32e7eSjoerg``DoubleType``) have hidden subclasses.  They are hidden because they offer no
330006f32e7eSjoerguseful functionality beyond what the ``Type`` class offers except to distinguish
330106f32e7eSjoergthemselves from other subclasses of ``Type``.
330206f32e7eSjoerg
330306f32e7eSjoergAll other types are subclasses of ``DerivedType``.  Types can be named, but this
330406f32e7eSjoergis not a requirement.  There exists exactly one instance of a given shape at any
330506f32e7eSjoergone time.  This allows type equality to be performed with address equality of
330606f32e7eSjoergthe Type Instance.  That is, given two ``Type*`` values, the types are identical
330706f32e7eSjoergif the pointers are identical.
330806f32e7eSjoerg
330906f32e7eSjoerg.. _m_Type:
331006f32e7eSjoerg
331106f32e7eSjoergImportant Public Methods
331206f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^
331306f32e7eSjoerg
331406f32e7eSjoerg* ``bool isIntegerTy() const``: Returns true for any integer type.
331506f32e7eSjoerg
331606f32e7eSjoerg* ``bool isFloatingPointTy()``: Return true if this is one of the five
331706f32e7eSjoerg  floating point types.
331806f32e7eSjoerg
331906f32e7eSjoerg* ``bool isSized()``: Return true if the type has known size.  Things
332006f32e7eSjoerg  that don't have a size are abstract types, labels and void.
332106f32e7eSjoerg
332206f32e7eSjoerg.. _derivedtypes:
332306f32e7eSjoerg
332406f32e7eSjoergImportant Derived Types
332506f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^
332606f32e7eSjoerg
332706f32e7eSjoerg``IntegerType``
332806f32e7eSjoerg  Subclass of DerivedType that represents integer types of any bit width.  Any
332906f32e7eSjoerg  bit width between ``IntegerType::MIN_INT_BITS`` (1) and
333006f32e7eSjoerg  ``IntegerType::MAX_INT_BITS`` (~8 million) can be represented.
333106f32e7eSjoerg
333206f32e7eSjoerg  * ``static const IntegerType* get(unsigned NumBits)``: get an integer
333306f32e7eSjoerg    type of a specific bit width.
333406f32e7eSjoerg
333506f32e7eSjoerg  * ``unsigned getBitWidth() const``: Get the bit width of an integer type.
333606f32e7eSjoerg
333706f32e7eSjoerg``SequentialType``
333806f32e7eSjoerg  This is subclassed by ArrayType and VectorType.
333906f32e7eSjoerg
334006f32e7eSjoerg  * ``const Type * getElementType() const``: Returns the type of each
334106f32e7eSjoerg    of the elements in the sequential type.
334206f32e7eSjoerg
334306f32e7eSjoerg  * ``uint64_t getNumElements() const``: Returns the number of elements
334406f32e7eSjoerg    in the sequential type.
334506f32e7eSjoerg
334606f32e7eSjoerg``ArrayType``
334706f32e7eSjoerg  This is a subclass of SequentialType and defines the interface for array
334806f32e7eSjoerg  types.
334906f32e7eSjoerg
335006f32e7eSjoerg``PointerType``
335106f32e7eSjoerg  Subclass of Type for pointer types.
335206f32e7eSjoerg
335306f32e7eSjoerg``VectorType``
335406f32e7eSjoerg  Subclass of SequentialType for vector types.  A vector type is similar to an
335506f32e7eSjoerg  ArrayType but is distinguished because it is a first class type whereas
335606f32e7eSjoerg  ArrayType is not.  Vector types are used for vector operations and are usually
335706f32e7eSjoerg  small vectors of an integer or floating point type.
335806f32e7eSjoerg
335906f32e7eSjoerg``StructType``
336006f32e7eSjoerg  Subclass of DerivedTypes for struct types.
336106f32e7eSjoerg
336206f32e7eSjoerg.. _FunctionType:
336306f32e7eSjoerg
336406f32e7eSjoerg``FunctionType``
336506f32e7eSjoerg  Subclass of DerivedTypes for function types.
336606f32e7eSjoerg
336706f32e7eSjoerg  * ``bool isVarArg() const``: Returns true if it's a vararg function.
336806f32e7eSjoerg
336906f32e7eSjoerg  * ``const Type * getReturnType() const``: Returns the return type of the
337006f32e7eSjoerg    function.
337106f32e7eSjoerg
337206f32e7eSjoerg  * ``const Type * getParamType (unsigned i)``: Returns the type of the ith
337306f32e7eSjoerg    parameter.
337406f32e7eSjoerg
337506f32e7eSjoerg  * ``const unsigned getNumParams() const``: Returns the number of formal
337606f32e7eSjoerg    parameters.
337706f32e7eSjoerg
337806f32e7eSjoerg.. _Module:
337906f32e7eSjoerg
338006f32e7eSjoergThe ``Module`` class
338106f32e7eSjoerg--------------------
338206f32e7eSjoerg
338306f32e7eSjoerg``#include "llvm/IR/Module.h"``
338406f32e7eSjoerg
3385*da58b97aSjoergheader source: `Module.h <https://llvm.org/doxygen/Module_8h_source.html>`_
338606f32e7eSjoerg
3387*da58b97aSjoergdoxygen info: `Module Class <https://llvm.org/doxygen/classllvm_1_1Module.html>`_
338806f32e7eSjoerg
338906f32e7eSjoergThe ``Module`` class represents the top level structure present in LLVM
339006f32e7eSjoergprograms.  An LLVM module is effectively either a translation unit of the
339106f32e7eSjoergoriginal program or a combination of several translation units merged by the
339206f32e7eSjoerglinker.  The ``Module`` class keeps track of a list of :ref:`Function
339306f32e7eSjoerg<c_Function>`\ s, a list of GlobalVariable_\ s, and a SymbolTable_.
339406f32e7eSjoergAdditionally, it contains a few helpful member functions that try to make common
339506f32e7eSjoergoperations easy.
339606f32e7eSjoerg
339706f32e7eSjoerg.. _m_Module:
339806f32e7eSjoerg
339906f32e7eSjoergImportant Public Members of the ``Module`` class
340006f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
340106f32e7eSjoerg
340206f32e7eSjoerg* ``Module::Module(std::string name = "")``
340306f32e7eSjoerg
340406f32e7eSjoerg  Constructing a Module_ is easy.  You can optionally provide a name for it
340506f32e7eSjoerg  (probably based on the name of the translation unit).
340606f32e7eSjoerg
340706f32e7eSjoerg* | ``Module::iterator`` - Typedef for function list iterator
340806f32e7eSjoerg  | ``Module::const_iterator`` - Typedef for const_iterator.
340906f32e7eSjoerg  | ``begin()``, ``end()``, ``size()``, ``empty()``
341006f32e7eSjoerg
341106f32e7eSjoerg  These are forwarding methods that make it easy to access the contents of a
341206f32e7eSjoerg  ``Module`` object's :ref:`Function <c_Function>` list.
341306f32e7eSjoerg
341406f32e7eSjoerg* ``Module::FunctionListType &getFunctionList()``
341506f32e7eSjoerg
341606f32e7eSjoerg  Returns the list of :ref:`Function <c_Function>`\ s.  This is necessary to use
341706f32e7eSjoerg  when you need to update the list or perform a complex action that doesn't have
341806f32e7eSjoerg  a forwarding method.
341906f32e7eSjoerg
342006f32e7eSjoerg----------------
342106f32e7eSjoerg
342206f32e7eSjoerg* | ``Module::global_iterator`` - Typedef for global variable list iterator
342306f32e7eSjoerg  | ``Module::const_global_iterator`` - Typedef for const_iterator.
342406f32e7eSjoerg  | ``global_begin()``, ``global_end()``, ``global_size()``, ``global_empty()``
342506f32e7eSjoerg
342606f32e7eSjoerg  These are forwarding methods that make it easy to access the contents of a
342706f32e7eSjoerg  ``Module`` object's GlobalVariable_ list.
342806f32e7eSjoerg
342906f32e7eSjoerg* ``Module::GlobalListType &getGlobalList()``
343006f32e7eSjoerg
343106f32e7eSjoerg  Returns the list of GlobalVariable_\ s.  This is necessary to use when you
343206f32e7eSjoerg  need to update the list or perform a complex action that doesn't have a
343306f32e7eSjoerg  forwarding method.
343406f32e7eSjoerg
343506f32e7eSjoerg----------------
343606f32e7eSjoerg
343706f32e7eSjoerg* ``SymbolTable *getSymbolTable()``
343806f32e7eSjoerg
343906f32e7eSjoerg  Return a reference to the SymbolTable_ for this ``Module``.
344006f32e7eSjoerg
344106f32e7eSjoerg----------------
344206f32e7eSjoerg
344306f32e7eSjoerg* ``Function *getFunction(StringRef Name) const``
344406f32e7eSjoerg
344506f32e7eSjoerg  Look up the specified function in the ``Module`` SymbolTable_.  If it does not
344606f32e7eSjoerg  exist, return ``null``.
344706f32e7eSjoerg
344806f32e7eSjoerg* ``FunctionCallee getOrInsertFunction(const std::string &Name,
344906f32e7eSjoerg  const FunctionType *T)``
345006f32e7eSjoerg
345106f32e7eSjoerg  Look up the specified function in the ``Module`` SymbolTable_.  If
345206f32e7eSjoerg  it does not exist, add an external declaration for the function and
345306f32e7eSjoerg  return it. Note that the function signature already present may not
345406f32e7eSjoerg  match the requested signature. Thus, in order to enable the common
345506f32e7eSjoerg  usage of passing the result directly to EmitCall, the return type is
345606f32e7eSjoerg  a struct of ``{FunctionType *T, Constant *FunctionPtr}``, rather
345706f32e7eSjoerg  than simply the ``Function*`` with potentially an unexpected
345806f32e7eSjoerg  signature.
345906f32e7eSjoerg
346006f32e7eSjoerg* ``std::string getTypeName(const Type *Ty)``
346106f32e7eSjoerg
346206f32e7eSjoerg  If there is at least one entry in the SymbolTable_ for the specified Type_,
346306f32e7eSjoerg  return it.  Otherwise return the empty string.
346406f32e7eSjoerg
346506f32e7eSjoerg* ``bool addTypeName(const std::string &Name, const Type *Ty)``
346606f32e7eSjoerg
346706f32e7eSjoerg  Insert an entry in the SymbolTable_ mapping ``Name`` to ``Ty``.  If there is
346806f32e7eSjoerg  already an entry for this name, true is returned and the SymbolTable_ is not
346906f32e7eSjoerg  modified.
347006f32e7eSjoerg
347106f32e7eSjoerg.. _Value:
347206f32e7eSjoerg
347306f32e7eSjoergThe ``Value`` class
347406f32e7eSjoerg-------------------
347506f32e7eSjoerg
347606f32e7eSjoerg``#include "llvm/IR/Value.h"``
347706f32e7eSjoerg
3478*da58b97aSjoergheader source: `Value.h <https://llvm.org/doxygen/Value_8h_source.html>`_
347906f32e7eSjoerg
3480*da58b97aSjoergdoxygen info: `Value Class <https://llvm.org/doxygen/classllvm_1_1Value.html>`_
348106f32e7eSjoerg
348206f32e7eSjoergThe ``Value`` class is the most important class in the LLVM Source base.  It
348306f32e7eSjoergrepresents a typed value that may be used (among other things) as an operand to
348406f32e7eSjoergan instruction.  There are many different types of ``Value``\ s, such as
348506f32e7eSjoergConstant_\ s, Argument_\ s.  Even Instruction_\ s and :ref:`Function
348606f32e7eSjoerg<c_Function>`\ s are ``Value``\ s.
348706f32e7eSjoerg
348806f32e7eSjoergA particular ``Value`` may be used many times in the LLVM representation for a
348906f32e7eSjoergprogram.  For example, an incoming argument to a function (represented with an
349006f32e7eSjoerginstance of the Argument_ class) is "used" by every instruction in the function
349106f32e7eSjoergthat references the argument.  To keep track of this relationship, the ``Value``
349206f32e7eSjoergclass keeps a list of all of the ``User``\ s that is using it (the User_ class
349306f32e7eSjoergis a base class for all nodes in the LLVM graph that can refer to ``Value``\ s).
349406f32e7eSjoergThis use list is how LLVM represents def-use information in the program, and is
349506f32e7eSjoergaccessible through the ``use_*`` methods, shown below.
349606f32e7eSjoerg
349706f32e7eSjoergBecause LLVM is a typed representation, every LLVM ``Value`` is typed, and this
349806f32e7eSjoergType_ is available through the ``getType()`` method.  In addition, all LLVM
349906f32e7eSjoergvalues can be named.  The "name" of the ``Value`` is a symbolic string printed
350006f32e7eSjoergin the LLVM code:
350106f32e7eSjoerg
350206f32e7eSjoerg.. code-block:: llvm
350306f32e7eSjoerg
350406f32e7eSjoerg  %foo = add i32 1, 2
350506f32e7eSjoerg
350606f32e7eSjoerg.. _nameWarning:
350706f32e7eSjoerg
350806f32e7eSjoergThe name of this instruction is "foo". **NOTE** that the name of any value may
350906f32e7eSjoergbe missing (an empty string), so names should **ONLY** be used for debugging
351006f32e7eSjoerg(making the source code easier to read, debugging printouts), they should not be
351106f32e7eSjoergused to keep track of values or map between them.  For this purpose, use a
351206f32e7eSjoerg``std::map`` of pointers to the ``Value`` itself instead.
351306f32e7eSjoerg
351406f32e7eSjoergOne important aspect of LLVM is that there is no distinction between an SSA
351506f32e7eSjoergvariable and the operation that produces it.  Because of this, any reference to
351606f32e7eSjoergthe value produced by an instruction (or the value available as an incoming
351706f32e7eSjoergargument, for example) is represented as a direct pointer to the instance of the
351806f32e7eSjoergclass that represents this value.  Although this may take some getting used to,
351906f32e7eSjoergit simplifies the representation and makes it easier to manipulate.
352006f32e7eSjoerg
352106f32e7eSjoerg.. _m_Value:
352206f32e7eSjoerg
352306f32e7eSjoergImportant Public Members of the ``Value`` class
352406f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
352506f32e7eSjoerg
352606f32e7eSjoerg* | ``Value::use_iterator`` - Typedef for iterator over the use-list
352706f32e7eSjoerg  | ``Value::const_use_iterator`` - Typedef for const_iterator over the
352806f32e7eSjoerg    use-list
352906f32e7eSjoerg  | ``unsigned use_size()`` - Returns the number of users of the value.
353006f32e7eSjoerg  | ``bool use_empty()`` - Returns true if there are no users.
353106f32e7eSjoerg  | ``use_iterator use_begin()`` - Get an iterator to the start of the
353206f32e7eSjoerg    use-list.
353306f32e7eSjoerg  | ``use_iterator use_end()`` - Get an iterator to the end of the use-list.
353406f32e7eSjoerg  | ``User *use_back()`` - Returns the last element in the list.
353506f32e7eSjoerg
353606f32e7eSjoerg  These methods are the interface to access the def-use information in LLVM.
353706f32e7eSjoerg  As with all other iterators in LLVM, the naming conventions follow the
353806f32e7eSjoerg  conventions defined by the STL_.
353906f32e7eSjoerg
354006f32e7eSjoerg* ``Type *getType() const``
354106f32e7eSjoerg  This method returns the Type of the Value.
354206f32e7eSjoerg
354306f32e7eSjoerg* | ``bool hasName() const``
354406f32e7eSjoerg  | ``std::string getName() const``
354506f32e7eSjoerg  | ``void setName(const std::string &Name)``
354606f32e7eSjoerg
354706f32e7eSjoerg  This family of methods is used to access and assign a name to a ``Value``, be
354806f32e7eSjoerg  aware of the :ref:`precaution above <nameWarning>`.
354906f32e7eSjoerg
355006f32e7eSjoerg* ``void replaceAllUsesWith(Value *V)``
355106f32e7eSjoerg
355206f32e7eSjoerg  This method traverses the use list of a ``Value`` changing all User_\ s of the
355306f32e7eSjoerg  current value to refer to "``V``" instead.  For example, if you detect that an
355406f32e7eSjoerg  instruction always produces a constant value (for example through constant
355506f32e7eSjoerg  folding), you can replace all uses of the instruction with the constant like
355606f32e7eSjoerg  this:
355706f32e7eSjoerg
355806f32e7eSjoerg  .. code-block:: c++
355906f32e7eSjoerg
356006f32e7eSjoerg    Inst->replaceAllUsesWith(ConstVal);
356106f32e7eSjoerg
356206f32e7eSjoerg.. _User:
356306f32e7eSjoerg
356406f32e7eSjoergThe ``User`` class
356506f32e7eSjoerg------------------
356606f32e7eSjoerg
356706f32e7eSjoerg``#include "llvm/IR/User.h"``
356806f32e7eSjoerg
3569*da58b97aSjoergheader source: `User.h <https://llvm.org/doxygen/User_8h_source.html>`_
357006f32e7eSjoerg
3571*da58b97aSjoergdoxygen info: `User Class <https://llvm.org/doxygen/classllvm_1_1User.html>`_
357206f32e7eSjoerg
357306f32e7eSjoergSuperclass: Value_
357406f32e7eSjoerg
357506f32e7eSjoergThe ``User`` class is the common base class of all LLVM nodes that may refer to
357606f32e7eSjoerg``Value``\ s.  It exposes a list of "Operands" that are all of the ``Value``\ s
357706f32e7eSjoergthat the User is referring to.  The ``User`` class itself is a subclass of
357806f32e7eSjoerg``Value``.
357906f32e7eSjoerg
358006f32e7eSjoergThe operands of a ``User`` point directly to the LLVM ``Value`` that it refers
358106f32e7eSjoergto.  Because LLVM uses Static Single Assignment (SSA) form, there can only be
358206f32e7eSjoergone definition referred to, allowing this direct connection.  This connection
358306f32e7eSjoergprovides the use-def information in LLVM.
358406f32e7eSjoerg
358506f32e7eSjoerg.. _m_User:
358606f32e7eSjoerg
358706f32e7eSjoergImportant Public Members of the ``User`` class
358806f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
358906f32e7eSjoerg
359006f32e7eSjoergThe ``User`` class exposes the operand list in two ways: through an index access
359106f32e7eSjoerginterface and through an iterator based interface.
359206f32e7eSjoerg
359306f32e7eSjoerg* | ``Value *getOperand(unsigned i)``
359406f32e7eSjoerg  | ``unsigned getNumOperands()``
359506f32e7eSjoerg
359606f32e7eSjoerg  These two methods expose the operands of the ``User`` in a convenient form for
359706f32e7eSjoerg  direct access.
359806f32e7eSjoerg
359906f32e7eSjoerg* | ``User::op_iterator`` - Typedef for iterator over the operand list
360006f32e7eSjoerg  | ``op_iterator op_begin()`` - Get an iterator to the start of the operand
360106f32e7eSjoerg    list.
360206f32e7eSjoerg  | ``op_iterator op_end()`` - Get an iterator to the end of the operand list.
360306f32e7eSjoerg
360406f32e7eSjoerg  Together, these methods make up the iterator based interface to the operands
360506f32e7eSjoerg  of a ``User``.
360606f32e7eSjoerg
360706f32e7eSjoerg
360806f32e7eSjoerg.. _Instruction:
360906f32e7eSjoerg
361006f32e7eSjoergThe ``Instruction`` class
361106f32e7eSjoerg-------------------------
361206f32e7eSjoerg
361306f32e7eSjoerg``#include "llvm/IR/Instruction.h"``
361406f32e7eSjoerg
361506f32e7eSjoergheader source: `Instruction.h
3616*da58b97aSjoerg<https://llvm.org/doxygen/Instruction_8h_source.html>`_
361706f32e7eSjoerg
361806f32e7eSjoergdoxygen info: `Instruction Class
3619*da58b97aSjoerg<https://llvm.org/doxygen/classllvm_1_1Instruction.html>`_
362006f32e7eSjoerg
362106f32e7eSjoergSuperclasses: User_, Value_
362206f32e7eSjoerg
362306f32e7eSjoergThe ``Instruction`` class is the common base class for all LLVM instructions.
362406f32e7eSjoergIt provides only a few methods, but is a very commonly used class.  The primary
362506f32e7eSjoergdata tracked by the ``Instruction`` class itself is the opcode (instruction
362606f32e7eSjoergtype) and the parent BasicBlock_ the ``Instruction`` is embedded into.  To
362706f32e7eSjoergrepresent a specific type of instruction, one of many subclasses of
362806f32e7eSjoerg``Instruction`` are used.
362906f32e7eSjoerg
363006f32e7eSjoergBecause the ``Instruction`` class subclasses the User_ class, its operands can
363106f32e7eSjoergbe accessed in the same way as for other ``User``\ s (with the
363206f32e7eSjoerg``getOperand()``/``getNumOperands()`` and ``op_begin()``/``op_end()`` methods).
363306f32e7eSjoergAn important file for the ``Instruction`` class is the ``llvm/Instruction.def``
363406f32e7eSjoergfile.  This file contains some meta-data about the various different types of
363506f32e7eSjoerginstructions in LLVM.  It describes the enum values that are used as opcodes
363606f32e7eSjoerg(for example ``Instruction::Add`` and ``Instruction::ICmp``), as well as the
363706f32e7eSjoergconcrete sub-classes of ``Instruction`` that implement the instruction (for
363806f32e7eSjoergexample BinaryOperator_ and CmpInst_).  Unfortunately, the use of macros in this
363906f32e7eSjoergfile confuses doxygen, so these enum values don't show up correctly in the
3640*da58b97aSjoerg`doxygen output <https://llvm.org/doxygen/classllvm_1_1Instruction.html>`_.
364106f32e7eSjoerg
364206f32e7eSjoerg.. _s_Instruction:
364306f32e7eSjoerg
364406f32e7eSjoergImportant Subclasses of the ``Instruction`` class
364506f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
364606f32e7eSjoerg
364706f32e7eSjoerg.. _BinaryOperator:
364806f32e7eSjoerg
364906f32e7eSjoerg* ``BinaryOperator``
365006f32e7eSjoerg
365106f32e7eSjoerg  This subclasses represents all two operand instructions whose operands must be
365206f32e7eSjoerg  the same type, except for the comparison instructions.
365306f32e7eSjoerg
365406f32e7eSjoerg.. _CastInst:
365506f32e7eSjoerg
365606f32e7eSjoerg* ``CastInst``
365706f32e7eSjoerg  This subclass is the parent of the 12 casting instructions.  It provides
365806f32e7eSjoerg  common operations on cast instructions.
365906f32e7eSjoerg
366006f32e7eSjoerg.. _CmpInst:
366106f32e7eSjoerg
366206f32e7eSjoerg* ``CmpInst``
366306f32e7eSjoerg
366406f32e7eSjoerg  This subclass represents the two comparison instructions,
3665*da58b97aSjoerg  `ICmpInst <LangRef.html#i_icmp>`_ (integer operands), and
366606f32e7eSjoerg  `FCmpInst <LangRef.html#i_fcmp>`_ (floating point operands).
366706f32e7eSjoerg
366806f32e7eSjoerg.. _m_Instruction:
366906f32e7eSjoerg
367006f32e7eSjoergImportant Public Members of the ``Instruction`` class
367106f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
367206f32e7eSjoerg
367306f32e7eSjoerg* ``BasicBlock *getParent()``
367406f32e7eSjoerg
367506f32e7eSjoerg  Returns the BasicBlock_ that this
367606f32e7eSjoerg  ``Instruction`` is embedded into.
367706f32e7eSjoerg
367806f32e7eSjoerg* ``bool mayWriteToMemory()``
367906f32e7eSjoerg
368006f32e7eSjoerg  Returns true if the instruction writes to memory, i.e. it is a ``call``,
368106f32e7eSjoerg  ``free``, ``invoke``, or ``store``.
368206f32e7eSjoerg
368306f32e7eSjoerg* ``unsigned getOpcode()``
368406f32e7eSjoerg
368506f32e7eSjoerg  Returns the opcode for the ``Instruction``.
368606f32e7eSjoerg
368706f32e7eSjoerg* ``Instruction *clone() const``
368806f32e7eSjoerg
368906f32e7eSjoerg  Returns another instance of the specified instruction, identical in all ways
369006f32e7eSjoerg  to the original except that the instruction has no parent (i.e. it's not
369106f32e7eSjoerg  embedded into a BasicBlock_), and it has no name.
369206f32e7eSjoerg
369306f32e7eSjoerg.. _Constant:
369406f32e7eSjoerg
369506f32e7eSjoergThe ``Constant`` class and subclasses
369606f32e7eSjoerg-------------------------------------
369706f32e7eSjoerg
369806f32e7eSjoergConstant represents a base class for different types of constants.  It is
369906f32e7eSjoergsubclassed by ConstantInt, ConstantArray, etc. for representing the various
370006f32e7eSjoergtypes of Constants.  GlobalValue_ is also a subclass, which represents the
370106f32e7eSjoergaddress of a global variable or function.
370206f32e7eSjoerg
370306f32e7eSjoerg.. _s_Constant:
370406f32e7eSjoerg
370506f32e7eSjoergImportant Subclasses of Constant
370606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
370706f32e7eSjoerg
370806f32e7eSjoerg* ConstantInt : This subclass of Constant represents an integer constant of
370906f32e7eSjoerg  any width.
371006f32e7eSjoerg
371106f32e7eSjoerg  * ``const APInt& getValue() const``: Returns the underlying
371206f32e7eSjoerg    value of this constant, an APInt value.
371306f32e7eSjoerg
371406f32e7eSjoerg  * ``int64_t getSExtValue() const``: Converts the underlying APInt value to an
371506f32e7eSjoerg    int64_t via sign extension.  If the value (not the bit width) of the APInt
371606f32e7eSjoerg    is too large to fit in an int64_t, an assertion will result.  For this
371706f32e7eSjoerg    reason, use of this method is discouraged.
371806f32e7eSjoerg
371906f32e7eSjoerg  * ``uint64_t getZExtValue() const``: Converts the underlying APInt value
372006f32e7eSjoerg    to a uint64_t via zero extension.  IF the value (not the bit width) of the
372106f32e7eSjoerg    APInt is too large to fit in a uint64_t, an assertion will result.  For this
372206f32e7eSjoerg    reason, use of this method is discouraged.
372306f32e7eSjoerg
372406f32e7eSjoerg  * ``static ConstantInt* get(const APInt& Val)``: Returns the ConstantInt
372506f32e7eSjoerg    object that represents the value provided by ``Val``.  The type is implied
372606f32e7eSjoerg    as the IntegerType that corresponds to the bit width of ``Val``.
372706f32e7eSjoerg
372806f32e7eSjoerg  * ``static ConstantInt* get(const Type *Ty, uint64_t Val)``: Returns the
372906f32e7eSjoerg    ConstantInt object that represents the value provided by ``Val`` for integer
373006f32e7eSjoerg    type ``Ty``.
373106f32e7eSjoerg
373206f32e7eSjoerg* ConstantFP : This class represents a floating point constant.
373306f32e7eSjoerg
373406f32e7eSjoerg  * ``double getValue() const``: Returns the underlying value of this constant.
373506f32e7eSjoerg
373606f32e7eSjoerg* ConstantArray : This represents a constant array.
373706f32e7eSjoerg
373806f32e7eSjoerg  * ``const std::vector<Use> &getValues() const``: Returns a vector of
373906f32e7eSjoerg    component constants that makeup this array.
374006f32e7eSjoerg
374106f32e7eSjoerg* ConstantStruct : This represents a constant struct.
374206f32e7eSjoerg
374306f32e7eSjoerg  * ``const std::vector<Use> &getValues() const``: Returns a vector of
374406f32e7eSjoerg    component constants that makeup this array.
374506f32e7eSjoerg
374606f32e7eSjoerg* GlobalValue : This represents either a global variable or a function.  In
374706f32e7eSjoerg  either case, the value is a constant fixed address (after linking).
374806f32e7eSjoerg
374906f32e7eSjoerg.. _GlobalValue:
375006f32e7eSjoerg
375106f32e7eSjoergThe ``GlobalValue`` class
375206f32e7eSjoerg-------------------------
375306f32e7eSjoerg
375406f32e7eSjoerg``#include "llvm/IR/GlobalValue.h"``
375506f32e7eSjoerg
375606f32e7eSjoergheader source: `GlobalValue.h
3757*da58b97aSjoerg<https://llvm.org/doxygen/GlobalValue_8h_source.html>`_
375806f32e7eSjoerg
375906f32e7eSjoergdoxygen info: `GlobalValue Class
3760*da58b97aSjoerg<https://llvm.org/doxygen/classllvm_1_1GlobalValue.html>`_
376106f32e7eSjoerg
376206f32e7eSjoergSuperclasses: Constant_, User_, Value_
376306f32e7eSjoerg
376406f32e7eSjoergGlobal values ( GlobalVariable_\ s or :ref:`Function <c_Function>`\ s) are the
376506f32e7eSjoergonly LLVM values that are visible in the bodies of all :ref:`Function
376606f32e7eSjoerg<c_Function>`\ s.  Because they are visible at global scope, they are also
376706f32e7eSjoergsubject to linking with other globals defined in different translation units.
376806f32e7eSjoergTo control the linking process, ``GlobalValue``\ s know their linkage rules.
376906f32e7eSjoergSpecifically, ``GlobalValue``\ s know whether they have internal or external
377006f32e7eSjoerglinkage, as defined by the ``LinkageTypes`` enumeration.
377106f32e7eSjoerg
377206f32e7eSjoergIf a ``GlobalValue`` has internal linkage (equivalent to being ``static`` in C),
377306f32e7eSjoergit is not visible to code outside the current translation unit, and does not
377406f32e7eSjoergparticipate in linking.  If it has external linkage, it is visible to external
377506f32e7eSjoergcode, and does participate in linking.  In addition to linkage information,
377606f32e7eSjoerg``GlobalValue``\ s keep track of which Module_ they are currently part of.
377706f32e7eSjoerg
377806f32e7eSjoergBecause ``GlobalValue``\ s are memory objects, they are always referred to by
377906f32e7eSjoergtheir **address**.  As such, the Type_ of a global is always a pointer to its
378006f32e7eSjoergcontents.  It is important to remember this when using the ``GetElementPtrInst``
378106f32e7eSjoerginstruction because this pointer must be dereferenced first.  For example, if
378206f32e7eSjoergyou have a ``GlobalVariable`` (a subclass of ``GlobalValue)`` that is an array
378306f32e7eSjoergof 24 ints, type ``[24 x i32]``, then the ``GlobalVariable`` is a pointer to
378406f32e7eSjoergthat array.  Although the address of the first element of this array and the
378506f32e7eSjoergvalue of the ``GlobalVariable`` are the same, they have different types.  The
378606f32e7eSjoerg``GlobalVariable``'s type is ``[24 x i32]``.  The first element's type is
378706f32e7eSjoerg``i32.`` Because of this, accessing a global value requires you to dereference
378806f32e7eSjoergthe pointer with ``GetElementPtrInst`` first, then its elements can be accessed.
378906f32e7eSjoergThis is explained in the `LLVM Language Reference Manual
379006f32e7eSjoerg<LangRef.html#globalvars>`_.
379106f32e7eSjoerg
379206f32e7eSjoerg.. _m_GlobalValue:
379306f32e7eSjoerg
379406f32e7eSjoergImportant Public Members of the ``GlobalValue`` class
379506f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
379606f32e7eSjoerg
379706f32e7eSjoerg* | ``bool hasInternalLinkage() const``
379806f32e7eSjoerg  | ``bool hasExternalLinkage() const``
379906f32e7eSjoerg  | ``void setInternalLinkage(bool HasInternalLinkage)``
380006f32e7eSjoerg
380106f32e7eSjoerg  These methods manipulate the linkage characteristics of the ``GlobalValue``.
380206f32e7eSjoerg
380306f32e7eSjoerg* ``Module *getParent()``
380406f32e7eSjoerg
380506f32e7eSjoerg  This returns the Module_ that the
380606f32e7eSjoerg  GlobalValue is currently embedded into.
380706f32e7eSjoerg
380806f32e7eSjoerg.. _c_Function:
380906f32e7eSjoerg
381006f32e7eSjoergThe ``Function`` class
381106f32e7eSjoerg----------------------
381206f32e7eSjoerg
381306f32e7eSjoerg``#include "llvm/IR/Function.h"``
381406f32e7eSjoerg
3815*da58b97aSjoergheader source: `Function.h <https://llvm.org/doxygen/Function_8h_source.html>`_
381606f32e7eSjoerg
381706f32e7eSjoergdoxygen info: `Function Class
3818*da58b97aSjoerg<https://llvm.org/doxygen/classllvm_1_1Function.html>`_
381906f32e7eSjoerg
382006f32e7eSjoergSuperclasses: GlobalValue_, Constant_, User_, Value_
382106f32e7eSjoerg
382206f32e7eSjoergThe ``Function`` class represents a single procedure in LLVM.  It is actually
382306f32e7eSjoergone of the more complex classes in the LLVM hierarchy because it must keep track
382406f32e7eSjoergof a large amount of data.  The ``Function`` class keeps track of a list of
382506f32e7eSjoergBasicBlock_\ s, a list of formal Argument_\ s, and a SymbolTable_.
382606f32e7eSjoerg
382706f32e7eSjoergThe list of BasicBlock_\ s is the most commonly used part of ``Function``
382806f32e7eSjoergobjects.  The list imposes an implicit ordering of the blocks in the function,
382906f32e7eSjoergwhich indicate how the code will be laid out by the backend.  Additionally, the
383006f32e7eSjoergfirst BasicBlock_ is the implicit entry node for the ``Function``.  It is not
383106f32e7eSjoerglegal in LLVM to explicitly branch to this initial block.  There are no implicit
383206f32e7eSjoergexit nodes, and in fact there may be multiple exit nodes from a single
383306f32e7eSjoerg``Function``.  If the BasicBlock_ list is empty, this indicates that the
383406f32e7eSjoerg``Function`` is actually a function declaration: the actual body of the function
383506f32e7eSjoerghasn't been linked in yet.
383606f32e7eSjoerg
383706f32e7eSjoergIn addition to a list of BasicBlock_\ s, the ``Function`` class also keeps track
383806f32e7eSjoergof the list of formal Argument_\ s that the function receives.  This container
383906f32e7eSjoergmanages the lifetime of the Argument_ nodes, just like the BasicBlock_ list does
384006f32e7eSjoergfor the BasicBlock_\ s.
384106f32e7eSjoerg
384206f32e7eSjoergThe SymbolTable_ is a very rarely used LLVM feature that is only used when you
384306f32e7eSjoerghave to look up a value by name.  Aside from that, the SymbolTable_ is used
384406f32e7eSjoerginternally to make sure that there are not conflicts between the names of
384506f32e7eSjoergInstruction_\ s, BasicBlock_\ s, or Argument_\ s in the function body.
384606f32e7eSjoerg
384706f32e7eSjoergNote that ``Function`` is a GlobalValue_ and therefore also a Constant_.  The
384806f32e7eSjoergvalue of the function is its address (after linking) which is guaranteed to be
384906f32e7eSjoergconstant.
385006f32e7eSjoerg
385106f32e7eSjoerg.. _m_Function:
385206f32e7eSjoerg
385306f32e7eSjoergImportant Public Members of the ``Function``
385406f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
385506f32e7eSjoerg
385606f32e7eSjoerg* ``Function(const FunctionType *Ty, LinkageTypes Linkage,
385706f32e7eSjoerg  const std::string &N = "", Module* Parent = 0)``
385806f32e7eSjoerg
385906f32e7eSjoerg  Constructor used when you need to create new ``Function``\ s to add the
386006f32e7eSjoerg  program.  The constructor must specify the type of the function to create and
386106f32e7eSjoerg  what type of linkage the function should have.  The FunctionType_ argument
386206f32e7eSjoerg  specifies the formal arguments and return value for the function.  The same
386306f32e7eSjoerg  FunctionType_ value can be used to create multiple functions.  The ``Parent``
386406f32e7eSjoerg  argument specifies the Module in which the function is defined.  If this
386506f32e7eSjoerg  argument is provided, the function will automatically be inserted into that
386606f32e7eSjoerg  module's list of functions.
386706f32e7eSjoerg
386806f32e7eSjoerg* ``bool isDeclaration()``
386906f32e7eSjoerg
387006f32e7eSjoerg  Return whether or not the ``Function`` has a body defined.  If the function is
387106f32e7eSjoerg  "external", it does not have a body, and thus must be resolved by linking with
387206f32e7eSjoerg  a function defined in a different translation unit.
387306f32e7eSjoerg
387406f32e7eSjoerg* | ``Function::iterator`` - Typedef for basic block list iterator
387506f32e7eSjoerg  | ``Function::const_iterator`` - Typedef for const_iterator.
387606f32e7eSjoerg  | ``begin()``, ``end()``, ``size()``, ``empty()``
387706f32e7eSjoerg
387806f32e7eSjoerg  These are forwarding methods that make it easy to access the contents of a
387906f32e7eSjoerg  ``Function`` object's BasicBlock_ list.
388006f32e7eSjoerg
388106f32e7eSjoerg* ``Function::BasicBlockListType &getBasicBlockList()``
388206f32e7eSjoerg
388306f32e7eSjoerg  Returns the list of BasicBlock_\ s.  This is necessary to use when you need to
388406f32e7eSjoerg  update the list or perform a complex action that doesn't have a forwarding
388506f32e7eSjoerg  method.
388606f32e7eSjoerg
388706f32e7eSjoerg* | ``Function::arg_iterator`` - Typedef for the argument list iterator
388806f32e7eSjoerg  | ``Function::const_arg_iterator`` - Typedef for const_iterator.
388906f32e7eSjoerg  | ``arg_begin()``, ``arg_end()``, ``arg_size()``, ``arg_empty()``
389006f32e7eSjoerg
389106f32e7eSjoerg  These are forwarding methods that make it easy to access the contents of a
389206f32e7eSjoerg  ``Function`` object's Argument_ list.
389306f32e7eSjoerg
389406f32e7eSjoerg* ``Function::ArgumentListType &getArgumentList()``
389506f32e7eSjoerg
389606f32e7eSjoerg  Returns the list of Argument_.  This is necessary to use when you need to
389706f32e7eSjoerg  update the list or perform a complex action that doesn't have a forwarding
389806f32e7eSjoerg  method.
389906f32e7eSjoerg
390006f32e7eSjoerg* ``BasicBlock &getEntryBlock()``
390106f32e7eSjoerg
390206f32e7eSjoerg  Returns the entry ``BasicBlock`` for the function.  Because the entry block
390306f32e7eSjoerg  for the function is always the first block, this returns the first block of
390406f32e7eSjoerg  the ``Function``.
390506f32e7eSjoerg
390606f32e7eSjoerg* | ``Type *getReturnType()``
390706f32e7eSjoerg  | ``FunctionType *getFunctionType()``
390806f32e7eSjoerg
390906f32e7eSjoerg  This traverses the Type_ of the ``Function`` and returns the return type of
391006f32e7eSjoerg  the function, or the FunctionType_ of the actual function.
391106f32e7eSjoerg
391206f32e7eSjoerg* ``SymbolTable *getSymbolTable()``
391306f32e7eSjoerg
391406f32e7eSjoerg  Return a pointer to the SymbolTable_ for this ``Function``.
391506f32e7eSjoerg
391606f32e7eSjoerg.. _GlobalVariable:
391706f32e7eSjoerg
391806f32e7eSjoergThe ``GlobalVariable`` class
391906f32e7eSjoerg----------------------------
392006f32e7eSjoerg
392106f32e7eSjoerg``#include "llvm/IR/GlobalVariable.h"``
392206f32e7eSjoerg
392306f32e7eSjoergheader source: `GlobalVariable.h
3924*da58b97aSjoerg<https://llvm.org/doxygen/GlobalVariable_8h_source.html>`_
392506f32e7eSjoerg
392606f32e7eSjoergdoxygen info: `GlobalVariable Class
3927*da58b97aSjoerg<https://llvm.org/doxygen/classllvm_1_1GlobalVariable.html>`_
392806f32e7eSjoerg
392906f32e7eSjoergSuperclasses: GlobalValue_, Constant_, User_, Value_
393006f32e7eSjoerg
393106f32e7eSjoergGlobal variables are represented with the (surprise surprise) ``GlobalVariable``
393206f32e7eSjoergclass.  Like functions, ``GlobalVariable``\ s are also subclasses of
393306f32e7eSjoergGlobalValue_, and as such are always referenced by their address (global values
393406f32e7eSjoergmust live in memory, so their "name" refers to their constant address).  See
393506f32e7eSjoergGlobalValue_ for more on this.  Global variables may have an initial value
393606f32e7eSjoerg(which must be a Constant_), and if they have an initializer, they may be marked
393706f32e7eSjoergas "constant" themselves (indicating that their contents never change at
393806f32e7eSjoergruntime).
393906f32e7eSjoerg
394006f32e7eSjoerg.. _m_GlobalVariable:
394106f32e7eSjoerg
394206f32e7eSjoergImportant Public Members of the ``GlobalVariable`` class
394306f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
394406f32e7eSjoerg
394506f32e7eSjoerg* ``GlobalVariable(const Type *Ty, bool isConstant, LinkageTypes &Linkage,
394606f32e7eSjoerg  Constant *Initializer = 0, const std::string &Name = "", Module* Parent = 0)``
394706f32e7eSjoerg
394806f32e7eSjoerg  Create a new global variable of the specified type.  If ``isConstant`` is true
394906f32e7eSjoerg  then the global variable will be marked as unchanging for the program.  The
395006f32e7eSjoerg  Linkage parameter specifies the type of linkage (internal, external, weak,
395106f32e7eSjoerg  linkonce, appending) for the variable.  If the linkage is InternalLinkage,
395206f32e7eSjoerg  WeakAnyLinkage, WeakODRLinkage, LinkOnceAnyLinkage or LinkOnceODRLinkage, then
395306f32e7eSjoerg  the resultant global variable will have internal linkage.  AppendingLinkage
395406f32e7eSjoerg  concatenates together all instances (in different translation units) of the
395506f32e7eSjoerg  variable into a single variable but is only applicable to arrays.  See the
395606f32e7eSjoerg  `LLVM Language Reference <LangRef.html#modulestructure>`_ for further details
395706f32e7eSjoerg  on linkage types.  Optionally an initializer, a name, and the module to put
395806f32e7eSjoerg  the variable into may be specified for the global variable as well.
395906f32e7eSjoerg
396006f32e7eSjoerg* ``bool isConstant() const``
396106f32e7eSjoerg
396206f32e7eSjoerg  Returns true if this is a global variable that is known not to be modified at
396306f32e7eSjoerg  runtime.
396406f32e7eSjoerg
396506f32e7eSjoerg* ``bool hasInitializer()``
396606f32e7eSjoerg
3967*da58b97aSjoerg  Returns true if this ``GlobalVariable`` has an initializer.
396806f32e7eSjoerg
396906f32e7eSjoerg* ``Constant *getInitializer()``
397006f32e7eSjoerg
397106f32e7eSjoerg  Returns the initial value for a ``GlobalVariable``.  It is not legal to call
397206f32e7eSjoerg  this method if there is no initializer.
397306f32e7eSjoerg
397406f32e7eSjoerg.. _BasicBlock:
397506f32e7eSjoerg
397606f32e7eSjoergThe ``BasicBlock`` class
397706f32e7eSjoerg------------------------
397806f32e7eSjoerg
397906f32e7eSjoerg``#include "llvm/IR/BasicBlock.h"``
398006f32e7eSjoerg
398106f32e7eSjoergheader source: `BasicBlock.h
3982*da58b97aSjoerg<https://llvm.org/doxygen/BasicBlock_8h_source.html>`_
398306f32e7eSjoerg
398406f32e7eSjoergdoxygen info: `BasicBlock Class
3985*da58b97aSjoerg<https://llvm.org/doxygen/classllvm_1_1BasicBlock.html>`_
398606f32e7eSjoerg
398706f32e7eSjoergSuperclass: Value_
398806f32e7eSjoerg
398906f32e7eSjoergThis class represents a single entry single exit section of the code, commonly
399006f32e7eSjoergknown as a basic block by the compiler community.  The ``BasicBlock`` class
399106f32e7eSjoergmaintains a list of Instruction_\ s, which form the body of the block.  Matching
399206f32e7eSjoergthe language definition, the last element of this list of instructions is always
399306f32e7eSjoerga terminator instruction.
399406f32e7eSjoerg
399506f32e7eSjoergIn addition to tracking the list of instructions that make up the block, the
399606f32e7eSjoerg``BasicBlock`` class also keeps track of the :ref:`Function <c_Function>` that
399706f32e7eSjoergit is embedded into.
399806f32e7eSjoerg
399906f32e7eSjoergNote that ``BasicBlock``\ s themselves are Value_\ s, because they are
400006f32e7eSjoergreferenced by instructions like branches and can go in the switch tables.
400106f32e7eSjoerg``BasicBlock``\ s have type ``label``.
400206f32e7eSjoerg
400306f32e7eSjoerg.. _m_BasicBlock:
400406f32e7eSjoerg
400506f32e7eSjoergImportant Public Members of the ``BasicBlock`` class
400606f32e7eSjoerg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
400706f32e7eSjoerg
400806f32e7eSjoerg* ``BasicBlock(const std::string &Name = "", Function *Parent = 0)``
400906f32e7eSjoerg
401006f32e7eSjoerg  The ``BasicBlock`` constructor is used to create new basic blocks for
401106f32e7eSjoerg  insertion into a function.  The constructor optionally takes a name for the
401206f32e7eSjoerg  new block, and a :ref:`Function <c_Function>` to insert it into.  If the
401306f32e7eSjoerg  ``Parent`` parameter is specified, the new ``BasicBlock`` is automatically
401406f32e7eSjoerg  inserted at the end of the specified :ref:`Function <c_Function>`, if not
401506f32e7eSjoerg  specified, the BasicBlock must be manually inserted into the :ref:`Function
401606f32e7eSjoerg  <c_Function>`.
401706f32e7eSjoerg
401806f32e7eSjoerg* | ``BasicBlock::iterator`` - Typedef for instruction list iterator
401906f32e7eSjoerg  | ``BasicBlock::const_iterator`` - Typedef for const_iterator.
402006f32e7eSjoerg  | ``begin()``, ``end()``, ``front()``, ``back()``,
402106f32e7eSjoerg    ``size()``, ``empty()``
402206f32e7eSjoerg    STL-style functions for accessing the instruction list.
402306f32e7eSjoerg
402406f32e7eSjoerg  These methods and typedefs are forwarding functions that have the same
402506f32e7eSjoerg  semantics as the standard library methods of the same names.  These methods
402606f32e7eSjoerg  expose the underlying instruction list of a basic block in a way that is easy
402706f32e7eSjoerg  to manipulate.  To get the full complement of container operations (including
402806f32e7eSjoerg  operations to update the list), you must use the ``getInstList()`` method.
402906f32e7eSjoerg
403006f32e7eSjoerg* ``BasicBlock::InstListType &getInstList()``
403106f32e7eSjoerg
403206f32e7eSjoerg  This method is used to get access to the underlying container that actually
403306f32e7eSjoerg  holds the Instructions.  This method must be used when there isn't a
403406f32e7eSjoerg  forwarding function in the ``BasicBlock`` class for the operation that you
403506f32e7eSjoerg  would like to perform.  Because there are no forwarding functions for
403606f32e7eSjoerg  "updating" operations, you need to use this if you want to update the contents
403706f32e7eSjoerg  of a ``BasicBlock``.
403806f32e7eSjoerg
403906f32e7eSjoerg* ``Function *getParent()``
404006f32e7eSjoerg
404106f32e7eSjoerg  Returns a pointer to :ref:`Function <c_Function>` the block is embedded into,
404206f32e7eSjoerg  or a null pointer if it is homeless.
404306f32e7eSjoerg
404406f32e7eSjoerg* ``Instruction *getTerminator()``
404506f32e7eSjoerg
404606f32e7eSjoerg  Returns a pointer to the terminator instruction that appears at the end of the
404706f32e7eSjoerg  ``BasicBlock``.  If there is no terminator instruction, or if the last
404806f32e7eSjoerg  instruction in the block is not a terminator, then a null pointer is returned.
404906f32e7eSjoerg
405006f32e7eSjoerg.. _Argument:
405106f32e7eSjoerg
405206f32e7eSjoergThe ``Argument`` class
405306f32e7eSjoerg----------------------
405406f32e7eSjoerg
405506f32e7eSjoergThis subclass of Value defines the interface for incoming formal arguments to a
405606f32e7eSjoergfunction.  A Function maintains a list of its formal arguments.  An argument has
405706f32e7eSjoerga pointer to the parent Function.
405806f32e7eSjoerg
405906f32e7eSjoerg
4060