1=================
2DataFlowSanitizer
3=================
4
5.. toctree::
6   :hidden:
7
8   DataFlowSanitizerDesign
9
10.. contents::
11   :local:
12
13Introduction
14============
15
16DataFlowSanitizer is a generalised dynamic data flow analysis.
17
18Unlike other Sanitizer tools, this tool is not designed to detect a
19specific class of bugs on its own.  Instead, it provides a generic
20dynamic data flow analysis framework to be used by clients to help
21detect application-specific issues within their own code.
22
23Usage
24=====
25
26With no program changes, applying DataFlowSanitizer to a program
27will not alter its behavior.  To use DataFlowSanitizer, the program
28uses API functions to apply tags to data to cause it to be tracked, and to
29check the tag of a specific data item.  DataFlowSanitizer manages
30the propagation of tags through the program according to its data flow.
31
32The APIs are defined in the header file ``sanitizer/dfsan_interface.h``.
33For further information about each function, please refer to the header
34file.
35
36ABI List
37--------
38
39DataFlowSanitizer uses a list of functions known as an ABI list to decide
40whether a call to a specific function should use the operating system's native
41ABI or whether it should use a variant of this ABI that also propagates labels
42through function parameters and return values.  The ABI list file also controls
43how labels are propagated in the former case.  DataFlowSanitizer comes with a
44default ABI list which is intended to eventually cover the glibc library on
45Linux but it may become necessary for users to extend the ABI list in cases
46where a particular library or function cannot be instrumented (e.g. because
47it is implemented in assembly or another language which DataFlowSanitizer does
48not support) or a function is called from a library or function which cannot
49be instrumented.
50
51DataFlowSanitizer's ABI list file is a :doc:`SanitizerSpecialCaseList`.
52The pass treats every function in the ``uninstrumented`` category in the
53ABI list file as conforming to the native ABI.  Unless the ABI list contains
54additional categories for those functions, a call to one of those functions
55will produce a warning message, as the labelling behavior of the function
56is unknown.  The other supported categories are ``discard``, ``functional``
57and ``custom``.
58
59* ``discard`` -- To the extent that this function writes to (user-accessible)
60  memory, it also updates labels in shadow memory (this condition is trivially
61  satisfied for functions which do not write to user-accessible memory).  Its
62  return value is unlabelled.
63* ``functional`` -- Like ``discard``, except that the label of its return value
64  is the union of the label of its arguments.
65* ``custom`` -- Instead of calling the function, a custom wrapper ``__dfsw_F``
66  is called, where ``F`` is the name of the function.  This function may wrap
67  the original function or provide its own implementation.  This category is
68  generally used for uninstrumentable functions which write to user-accessible
69  memory or which have more complex label propagation behavior.  The signature
70  of ``__dfsw_F`` is based on that of ``F`` with each argument having a
71  label of type ``dfsan_label`` appended to the argument list.  If ``F``
72  is of non-void return type a final argument of type ``dfsan_label *``
73  is appended to which the custom function can store the label for the
74  return value.  For example:
75
76.. code-block:: c++
77
78  void f(int x);
79  void __dfsw_f(int x, dfsan_label x_label);
80
81  void *memcpy(void *dest, const void *src, size_t n);
82  void *__dfsw_memcpy(void *dest, const void *src, size_t n,
83                      dfsan_label dest_label, dfsan_label src_label,
84                      dfsan_label n_label, dfsan_label *ret_label);
85
86If a function defined in the translation unit being compiled belongs to the
87``uninstrumented`` category, it will be compiled so as to conform to the
88native ABI.  Its arguments will be assumed to be unlabelled, but it will
89propagate labels in shadow memory.
90
91For example:
92
93.. code-block:: none
94
95  # main is called by the C runtime using the native ABI.
96  fun:main=uninstrumented
97  fun:main=discard
98
99  # malloc only writes to its internal data structures, not user-accessible memory.
100  fun:malloc=uninstrumented
101  fun:malloc=discard
102
103  # tolower is a pure function.
104  fun:tolower=uninstrumented
105  fun:tolower=functional
106
107  # memcpy needs to copy the shadow from the source to the destination region.
108  # This is done in a custom function.
109  fun:memcpy=uninstrumented
110  fun:memcpy=custom
111
112Example
113=======
114
115The following program demonstrates label propagation by checking that
116the correct labels are propagated.
117
118.. code-block:: c++
119
120  #include <sanitizer/dfsan_interface.h>
121  #include <assert.h>
122
123  int main(void) {
124    int i = 1;
125    dfsan_label i_label = dfsan_create_label("i", 0);
126    dfsan_set_label(i_label, &i, sizeof(i));
127
128    int j = 2;
129    dfsan_label j_label = dfsan_create_label("j", 0);
130    dfsan_set_label(j_label, &j, sizeof(j));
131
132    int k = 3;
133    dfsan_label k_label = dfsan_create_label("k", 0);
134    dfsan_set_label(k_label, &k, sizeof(k));
135
136    dfsan_label ij_label = dfsan_get_label(i + j);
137    assert(dfsan_has_label(ij_label, i_label));
138    assert(dfsan_has_label(ij_label, j_label));
139    assert(!dfsan_has_label(ij_label, k_label));
140
141    dfsan_label ijk_label = dfsan_get_label(i + j + k);
142    assert(dfsan_has_label(ijk_label, i_label));
143    assert(dfsan_has_label(ijk_label, j_label));
144    assert(dfsan_has_label(ijk_label, k_label));
145
146    return 0;
147  }
148
149Current status
150==============
151
152DataFlowSanitizer is a work in progress, currently under development for
153x86\_64 Linux.
154
155Design
156======
157
158Please refer to the :doc:`design document<DataFlowSanitizerDesign>`.
159