1=======================================================
2Hardware-assisted AddressSanitizer Design Documentation
3=======================================================
4
5This page is a design document for
6**hardware-assisted AddressSanitizer** (or **HWASAN**)
7a tool similar to :doc:`AddressSanitizer`,
8but based on partial hardware assistance.
9
10
11Introduction
12============
13
14:doc:`AddressSanitizer`
15tags every 8 bytes of the application memory with a 1 byte tag (using *shadow memory*),
16uses *redzones* to find buffer-overflows and
17*quarantine* to find use-after-free.
18The redzones, the quarantine, and, to a less extent, the shadow, are the
19sources of AddressSanitizer's memory overhead.
20See the `AddressSanitizer paper`_ for details.
21
22AArch64 has the `Address Tagging`_ (or top-byte-ignore, TBI), a hardware feature that allows
23software to use 8 most significant bits of a 64-bit pointer as
24a tag. HWASAN uses `Address Tagging`_
25to implement a memory safety tool, similar to :doc:`AddressSanitizer`,
26but with smaller memory overhead and slightly different (mostly better)
27accuracy guarantees.
28
29Algorithm
30=========
31* Every heap/stack/global memory object is forcibly aligned by `TG` bytes
32  (`TG` is e.g. 16 or 64). We call `TG` the **tagging granularity**.
33* For every such object a random `TS`-bit tag `T` is chosen (`TS`, or tag size, is e.g. 4 or 8)
34* The pointer to the object is tagged with `T`.
35* The memory for the object is also tagged with `T` (using a `TG=>1` shadow memory)
36* Every load and store is instrumented to read the memory tag and compare it
37  with the pointer tag, exception is raised on tag mismatch.
38
39For a more detailed discussion of this approach see https://arxiv.org/pdf/1802.09517.pdf
40
41Short granules
42--------------
43
44A short granule is a granule of size between 1 and `TG-1` bytes. The size
45of a short granule is stored at the location in shadow memory where the
46granule's tag is normally stored, while the granule's actual tag is stored
47in the last byte of the granule. This means that in order to verify that a
48pointer tag matches a memory tag, HWASAN must check for two possibilities:
49
50* the pointer tag is equal to the memory tag in shadow memory, or
51* the shadow memory tag is actually a short granule size, the value being loaded
52  is in bounds of the granule and the pointer tag is equal to the last byte of
53  the granule.
54
55Pointer tags between 1 to `TG-1` are possible and are as likely as any other
56tag. This means that these tags in memory have two interpretations: the full
57tag interpretation (where the pointer tag is between 1 and `TG-1` and the
58last byte of the granule is ordinary data) and the short tag interpretation
59(where the pointer tag is stored in the granule).
60
61When HWASAN detects an error near a memory tag between 1 and `TG-1`, it
62will show both the memory tag and the last byte of the granule. Currently,
63it is up to the user to disambiguate the two possibilities.
64
65Instrumentation
66===============
67
68Memory Accesses
69---------------
70All memory accesses are prefixed with an inline instruction sequence that
71verifies the tags. Currently, the following sequence is used:
72
73.. code-block:: none
74
75  // int foo(int *a) { return *a; }
76  // clang -O2 --target=aarch64-linux -fsanitize=hwaddress -fsanitize-recover=hwaddress -c load.c
77  foo:
78       0:	90000008 	adrp	x8, 0 <__hwasan_shadow>
79       4:	f9400108 	ldr	x8, [x8]         // shadow base (to be resolved by the loader)
80       8:	d344dc09 	ubfx	x9, x0, #4, #52  // shadow offset
81       c:	38696909 	ldrb	w9, [x8, x9]     // load shadow tag
82      10:	d378fc08 	lsr	x8, x0, #56      // extract address tag
83      14:	6b09011f 	cmp	w8, w9           // compare tags
84      18:	54000061 	b.ne	24 <foo+0x24>    // jump to short tag handler on mismatch
85      1c:	b9400000 	ldr	w0, [x0]         // original load
86      20:	d65f03c0 	ret
87      24:	7100413f 	cmp	w9, #0x10        // is this a short tag?
88      28:	54000142 	b.cs	50 <foo+0x50>    // if not, trap
89      2c:	12000c0a 	and	w10, w0, #0xf    // find the address's position in the short granule
90      30:	11000d4a 	add	w10, w10, #0x3   // adjust to the position of the last byte loaded
91      34:	6b09015f 	cmp	w10, w9          // check that position is in bounds
92      38:	540000c2 	b.cs	50 <foo+0x50>    // if not, trap
93      3c:	9240dc09 	and	x9, x0, #0xffffffffffffff
94      40:	b2400d29 	orr	x9, x9, #0xf     // compute address of last byte of granule
95      44:	39400129 	ldrb	w9, [x9]         // load tag from it
96      48:	6b09011f 	cmp	w8, w9           // compare with pointer tag
97      4c:	54fffe80 	b.eq	1c <foo+0x1c>    // if so, continue
98      50:	d4212440 	brk	#0x922           // otherwise trap
99      54:	b9400000 	ldr	w0, [x0]         // tail duplicated original load (to handle recovery)
100      58:	d65f03c0 	ret
101
102Alternatively, memory accesses are prefixed with a function call.
103On AArch64, a function call is used by default in trapping mode. The code size
104and performance overhead of the call is reduced by using a custom calling
105convention that preserves most registers and is specialized to the register
106containing the address and the type and size of the memory access.
107
108Heap
109----
110
111Tagging the heap memory/pointers is done by `malloc`.
112This can be based on any malloc that forces all objects to be TG-aligned.
113`free` tags the memory with a different tag.
114
115Stack
116-----
117
118Stack frames are instrumented by aligning all non-promotable allocas
119by `TG` and tagging stack memory in function prologue and epilogue.
120
121Tags for different allocas in one function are **not** generated
122independently; doing that in a function with `M` allocas would require
123maintaining `M` live stack pointers, significantly increasing register
124pressure. Instead we generate a single base tag value in the prologue,
125and build the tag for alloca number `M` as `ReTag(BaseTag, M)`, where
126ReTag can be as simple as exclusive-or with constant `M`.
127
128Stack instrumentation is expected to be a major source of overhead,
129but could be optional.
130
131Globals
132-------
133
134TODO: details.
135
136Error reporting
137---------------
138
139Errors are generated by the `HLT` instruction and are handled by a signal handler.
140
141Attribute
142---------
143
144HWASAN uses its own LLVM IR Attribute `sanitize_hwaddress` and a matching
145C function attribute. An alternative would be to re-use ASAN's attribute
146`sanitize_address`. The reasons to use a separate attribute are:
147
148  * Users may need to disable ASAN but not HWASAN, or vise versa,
149    because the tools have different trade-offs and compatibility issues.
150  * LLVM (ideally) does not use flags to decide which pass is being used,
151    ASAN or HWASAN are being applied, based on the function attributes.
152
153This does mean that users of HWASAN may need to add the new attribute
154to the code that already uses the old attribute.
155
156
157Comparison with AddressSanitizer
158================================
159
160HWASAN:
161  * Is less portable than :doc:`AddressSanitizer`
162    as it relies on hardware `Address Tagging`_ (AArch64).
163    Address Tagging can be emulated with compiler instrumentation,
164    but it will require the instrumentation to remove the tags before
165    any load or store, which is infeasible in any realistic environment
166    that contains non-instrumented code.
167  * May have compatibility problems if the target code uses higher
168    pointer bits for other purposes.
169  * May require changes in the OS kernels (e.g. Linux seems to dislike
170    tagged pointers passed from address space:
171    https://www.kernel.org/doc/Documentation/arm64/tagged-pointers.txt).
172  * **Does not require redzones to detect buffer overflows**,
173    but the buffer overflow detection is probabilistic, with roughly
174    `1/(2**TS)` chance of missing a bug (6.25% or 0.39% with 4 and 8-bit TS
175    respectively).
176  * **Does not require quarantine to detect heap-use-after-free,
177    or stack-use-after-return**.
178    The detection is similarly probabilistic.
179
180The memory overhead of HWASAN is expected to be much smaller
181than that of AddressSanitizer:
182`1/TG` extra memory for the shadow
183and some overhead due to `TG`-aligning all objects.
184
185Supported architectures
186=======================
187HWASAN relies on `Address Tagging`_ which is only available on AArch64.
188For other 64-bit architectures it is possible to remove the address tags
189before every load and store by compiler instrumentation, but this variant
190will have limited deployability since not all of the code is
191typically instrumented.
192
193The HWASAN's approach is not applicable to 32-bit architectures.
194
195
196Related Work
197============
198* `SPARC ADI`_ implements a similar tool mostly in hardware.
199* `Effective and Efficient Memory Protection Using Dynamic Tainting`_ discusses
200  similar approaches ("lock & key").
201* `Watchdog`_ discussed a heavier, but still somewhat similar
202  "lock & key" approach.
203* *TODO: add more "related work" links. Suggestions are welcome.*
204
205
206.. _Watchdog: https://www.cis.upenn.edu/acg/papers/isca12_watchdog.pdf
207.. _Effective and Efficient Memory Protection Using Dynamic Tainting: https://www.cc.gatech.edu/~orso/papers/clause.doudalis.orso.prvulovic.pdf
208.. _SPARC ADI: https://lazytyped.blogspot.com/2017/09/getting-started-with-adi.html
209.. _AddressSanitizer paper: https://www.usenix.org/system/files/conference/atc12/atc12-final39.pdf
210.. _Address Tagging: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch12s05s01.html
211
212