1============================
2Taint Analysis Configuration
3============================
4
5The Clang Static Analyzer uses taint analysis to detect security-related issues in code.
6The backbone of taint analysis in the Clang SA is the `GenericTaintChecker`, which the user can access via the :ref:`alpha-security-taint-TaintPropagation` checker alias and this checker has a default taint-related configuration.
7The built-in default settings are defined in code, and they are always in effect once the checker is enabled, either directly or via the alias.
8The checker also provides a configuration interface for extending the default settings by providing a configuration file in `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ format.
9This documentation describes the syntax of the configuration file and gives the informal semantics of the configuration options.
10
11.. contents::
12   :local:
13
14.. _clangsa-taint-configuration-overview:
15
16Overview
17________
18
19Taint analysis works by checking for the occurrence of special operations during the symbolic execution of the program.
20Taint analysis defines sources, sinks, and propagation rules. It identifies errors by detecting a flow of information that originates from a taint source, reaches a taint sink, and propagates through the program paths via propagation rules.
21A source, sink, or an operation that propagates taint is mainly domain-specific knowledge, but there are some built-in defaults provided by :ref:`alpha-security-taint-TaintPropagation`.
22It is possible to express that a statement sanitizes tainted values by providing a ``Filters`` section in the external configuration (see :ref:`clangsa-taint-configuration-example` and :ref:`clangsa-taint-filter-details`).
23There are no default filters defined in the built-in settings.
24The checker's documentation also specifies how to provide a custom taint configuration with command-line options.
25
26.. _clangsa-taint-configuration-example:
27
28Example configuration file
29__________________________
30
31.. code-block:: yaml
32
33  # The entries that specify arguments use 0-based indexing when specifying
34  # input arguments, and -1 is used to denote the return value.
35
36  Filters:
37    # Filter functions
38    # Taint is sanitized when tainted variables are pass arguments to filters.
39
40    # Filter function
41    #   void cleanse_first_arg(int* arg)
42    #
43    # Result example:
44    #   int x; // x is tainted
45    #   cleanse_first_arg(&x); // x is not tainted after the call
46    - Name: cleanse_first_arg
47      Args: [0]
48
49  Propagations:
50    # Source functions
51    # The omission of SrcArgs key indicates unconditional taint propagation,
52    # which is conceptually what a source does.
53
54    # Source function
55    #   size_t fread(void *ptr, size_t size, size_t nmemb, FILE * stream)
56    #
57    # Result example:
58    #   FILE* f = fopen("file.txt");
59    #   char buf[1024];
60    #   size_t read = fread(buf, sizeof(buf[0]), sizeof(buf)/sizeof(buf[0]), f);
61    #   // both read and buf are tainted
62    - Name: fread
63      DstArgs: [0, -1]
64
65    # Propagation functions
66    # The presence of SrcArgs key indicates conditional taint propagation,
67    # which is conceptually what a propagator does.
68
69    # Propagation function
70    #   char *dirname(char *path)
71    #
72    # Result example:
73    #   char* path = read_path();
74    #   char* dir = dirname(path);
75    #   // dir is tainted if path was tainted
76    - Name: dirname
77      SrcArgs: [0]
78      DstArgs: [-1]
79
80  Sinks:
81    # Sink functions
82    # If taint reaches any of the arguments specified, a warning is emitted.
83
84    # Sink function
85    #   int system(const char* command)
86    #
87    # Result example:
88    #   const char* command = read_command();
89    #   system(command); // emit diagnostic if command is tainted
90    - Name: system
91      Args: [0]
92
93In the example file above, the entries under the `Propagation` key implement the conceptual sources and propagations, and sinks have their dedicated `Sinks` key.
94The user can define operations (function calls) where the tainted values should be cleansed by listing entries under the `Filters` key.
95Filters model the sanitization of values done by the programmer, and providing these is key to avoiding false-positive findings.
96
97Configuration file syntax and semantics
98_______________________________________
99
100The configuration file should have valid `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ syntax.
101
102The configuration file can have the following top-level keys:
103 - Filters
104 - Propagations
105 - Sinks
106
107Under the `Filters` key, the user can specify a list of operations that remove taint (see :ref:`clangsa-taint-filter-details` for details).
108
109Under the `Propagations` key, the user can specify a list of operations that introduce and propagate taint (see :ref:`clangsa-taint-propagation-details` for details).
110The user can mark taint sources with a `SrcArgs` key in the `Propagation` key, while propagations have none.
111The lack of the `SrcArgs` key means unconditional propagation, which is how sources are modeled.
112The semantics of propagations are such, that if any of the source arguments are tainted (specified by indexes in `SrcArgs`) then all of the destination arguments (specified by indexes in `DstArgs`) also become tainted.
113
114Under the `Sinks` key, the user can specify a list of operations where the checker should emit a bug report if tainted data reaches it (see :ref:`clangsa-taint-sink-details` for details).
115
116.. _clangsa-taint-filter-details:
117
118Filter syntax and semantics
119###########################
120
121An entry under `Filters` is a `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ object with the following mandatory keys:
122 - `Name` is a string that specifies the name of a function.
123   Encountering this function during symbolic execution the checker will sanitize taint from the memory region referred to by the given arguments or return a sanitized value.
124 - `Args` is a list of numbers in the range of ``[-1..int_max]``.
125   It indicates the indexes of arguments in the function call.
126   The number ``-1`` signifies the return value; other numbers identify call arguments.
127   The values of these arguments are considered clean after the function call.
128
129The following keys are optional:
130 - `Scope` is a string that specifies the prefix of the function's name in its fully qualified name. This option restricts the set of matching function calls. It can encode not only namespaces but struct/class names as well to match member functions.
131
132 .. _clangsa-taint-propagation-details:
133
134Propagation syntax and semantics
135################################
136
137An entry under `Propagation` is a `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ object with the following mandatory keys:
138 - `Name` is a string that specifies the name of a function.
139   Encountering this function during symbolic execution propagate taint from one or more arguments to other arguments and possibly the return value.
140   It helps model the taint-related behavior of functions that are not analyzable otherwise.
141
142The following keys are optional:
143 - `Scope` is a string that specifies the prefix of the function's name in its fully qualified name. This option restricts the set of matching function calls.
144 - `SrcArgs` is a list of numbers in the range of ``[0..int_max]`` that indicates the indexes of arguments in the function call.
145   Taint-propagation considers the values of these arguments during the evaluation of the function call.
146   If any `SrcArgs` arguments are tainted, the checker will consider all `DstArgs` arguments tainted after the call.
147 - `DstArgs` is a list of numbers in the range of ``[-1..int_max]`` that indicates the indexes of arguments in the function call.
148   The number ``-1`` specifies the return value of the function.
149   If any `SrcArgs` arguments are tainted, the checker will consider all `DstArgs` arguments tainted after the call.
150 - `VariadicType` is a string that can be one of ``None``, ``Dst``, ``Src``.
151   It is used in conjunction with `VariadicIndex` to specify arguments inside a variadic argument.
152   The value of ``Src`` will treat every call site argument that is part of a variadic argument list as a source concerning propagation rules (as if specified by `SrcArg`).
153   The value of ``Dst`` will treat every call site argument that is part of a variadic argument list a destination concerning propagation rules.
154   The value of ``None`` will not consider the arguments that are part of a variadic argument list (this option is redundant but can be used to temporarily switch off handling of a particular variadic argument option without removing the VariadicIndex key).
155 - `VariadicIndex` is a number in the range of ``[0..int_max]``. It indicates the starting index of the variadic argument in the signature of the function.
156
157
158.. _clangsa-taint-sink-details:
159
160Sink syntax and semantics
161#########################
162
163An entry under `Sinks` is a `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ object with the following mandatory keys:
164 - `Name` is a string that specifies the name of a function.
165   Encountering this function during symbolic execution will emit a taint-related diagnostic if any of the arguments specified with `Args` are tainted at the call site.
166 - `Args` is a list of numbers in the range of ``[0..int_max]`` that indicates the indexes of arguments in the function call.
167   The checker reports an error if any of the specified arguments are tainted.
168
169The following keys are optional:
170 - `Scope` is a string that specifies the prefix of the function's name in its fully qualified name. This option restricts the set of matching function calls.
171