1======================================================= 2Hardware-assisted AddressSanitizer Design Documentation 3======================================================= 4 5This page is a design document for 6**hardware-assisted AddressSanitizer** (or **HWASAN**) 7a tool similar to :doc:`AddressSanitizer`, 8but based on partial hardware assistance. 9 10 11Introduction 12============ 13 14:doc:`AddressSanitizer` 15tags every 8 bytes of the application memory with a 1 byte tag (using *shadow memory*), 16uses *redzones* to find buffer-overflows and 17*quarantine* to find use-after-free. 18The redzones, the quarantine, and, to a less extent, the shadow, are the 19sources of AddressSanitizer's memory overhead. 20See the `AddressSanitizer paper`_ for details. 21 22AArch64 has the `Address Tagging`_ (or top-byte-ignore, TBI), a hardware feature that allows 23software to use 8 most significant bits of a 64-bit pointer as 24a tag. HWASAN uses `Address Tagging`_ 25to implement a memory safety tool, similar to :doc:`AddressSanitizer`, 26but with smaller memory overhead and slightly different (mostly better) 27accuracy guarantees. 28 29Algorithm 30========= 31* Every heap/stack/global memory object is forcibly aligned by `TG` bytes 32 (`TG` is e.g. 16 or 64). We call `TG` the **tagging granularity**. 33* For every such object a random `TS`-bit tag `T` is chosen (`TS`, or tag size, is e.g. 4 or 8) 34* The pointer to the object is tagged with `T`. 35* The memory for the object is also tagged with `T` (using a `TG=>1` shadow memory) 36* Every load and store is instrumented to read the memory tag and compare it 37 with the pointer tag, exception is raised on tag mismatch. 38 39For a more detailed discussion of this approach see https://arxiv.org/pdf/1802.09517.pdf 40 41Short granules 42-------------- 43 44A short granule is a granule of size between 1 and `TG-1` bytes. The size 45of a short granule is stored at the location in shadow memory where the 46granule's tag is normally stored, while the granule's actual tag is stored 47in the last byte of the granule. This means that in order to verify that a 48pointer tag matches a memory tag, HWASAN must check for two possibilities: 49 50* the pointer tag is equal to the memory tag in shadow memory, or 51* the shadow memory tag is actually a short granule size, the value being loaded 52 is in bounds of the granule and the pointer tag is equal to the last byte of 53 the granule. 54 55Pointer tags between 1 to `TG-1` are possible and are as likely as any other 56tag. This means that these tags in memory have two interpretations: the full 57tag interpretation (where the pointer tag is between 1 and `TG-1` and the 58last byte of the granule is ordinary data) and the short tag interpretation 59(where the pointer tag is stored in the granule). 60 61When HWASAN detects an error near a memory tag between 1 and `TG-1`, it 62will show both the memory tag and the last byte of the granule. Currently, 63it is up to the user to disambiguate the two possibilities. 64 65Instrumentation 66=============== 67 68Memory Accesses 69--------------- 70All memory accesses are prefixed with an inline instruction sequence that 71verifies the tags. Currently, the following sequence is used: 72 73.. code-block:: none 74 75 // int foo(int *a) { return *a; } 76 // clang -O2 --target=aarch64-linux -fsanitize=hwaddress -fsanitize-recover=hwaddress -c load.c 77 foo: 78 0: 90000008 adrp x8, 0 <__hwasan_shadow> 79 4: f9400108 ldr x8, [x8] // shadow base (to be resolved by the loader) 80 8: d344dc09 ubfx x9, x0, #4, #52 // shadow offset 81 c: 38696909 ldrb w9, [x8, x9] // load shadow tag 82 10: d378fc08 lsr x8, x0, #56 // extract address tag 83 14: 6b09011f cmp w8, w9 // compare tags 84 18: 54000061 b.ne 24 <foo+0x24> // jump to short tag handler on mismatch 85 1c: b9400000 ldr w0, [x0] // original load 86 20: d65f03c0 ret 87 24: 7100413f cmp w9, #0x10 // is this a short tag? 88 28: 54000142 b.cs 50 <foo+0x50> // if not, trap 89 2c: 12000c0a and w10, w0, #0xf // find the address's position in the short granule 90 30: 11000d4a add w10, w10, #0x3 // adjust to the position of the last byte loaded 91 34: 6b09015f cmp w10, w9 // check that position is in bounds 92 38: 540000c2 b.cs 50 <foo+0x50> // if not, trap 93 3c: 9240dc09 and x9, x0, #0xffffffffffffff 94 40: b2400d29 orr x9, x9, #0xf // compute address of last byte of granule 95 44: 39400129 ldrb w9, [x9] // load tag from it 96 48: 6b09011f cmp w8, w9 // compare with pointer tag 97 4c: 54fffe80 b.eq 1c <foo+0x1c> // if so, continue 98 50: d4212440 brk #0x922 // otherwise trap 99 54: b9400000 ldr w0, [x0] // tail duplicated original load (to handle recovery) 100 58: d65f03c0 ret 101 102Alternatively, memory accesses are prefixed with a function call. 103On AArch64, a function call is used by default in trapping mode. The code size 104and performance overhead of the call is reduced by using a custom calling 105convention that preserves most registers and is specialized to the register 106containing the address and the type and size of the memory access. 107 108Heap 109---- 110 111Tagging the heap memory/pointers is done by `malloc`. 112This can be based on any malloc that forces all objects to be TG-aligned. 113`free` tags the memory with a different tag. 114 115Stack 116----- 117 118Stack frames are instrumented by aligning all non-promotable allocas 119by `TG` and tagging stack memory in function prologue and epilogue. 120 121Tags for different allocas in one function are **not** generated 122independently; doing that in a function with `M` allocas would require 123maintaining `M` live stack pointers, significantly increasing register 124pressure. Instead we generate a single base tag value in the prologue, 125and build the tag for alloca number `M` as `ReTag(BaseTag, M)`, where 126ReTag can be as simple as exclusive-or with constant `M`. 127 128Stack instrumentation is expected to be a major source of overhead, 129but could be optional. 130 131Globals 132------- 133 134TODO: details. 135 136Error reporting 137--------------- 138 139Errors are generated by the `HLT` instruction and are handled by a signal handler. 140 141Attribute 142--------- 143 144HWASAN uses its own LLVM IR Attribute `sanitize_hwaddress` and a matching 145C function attribute. An alternative would be to re-use ASAN's attribute 146`sanitize_address`. The reasons to use a separate attribute are: 147 148 * Users may need to disable ASAN but not HWASAN, or vise versa, 149 because the tools have different trade-offs and compatibility issues. 150 * LLVM (ideally) does not use flags to decide which pass is being used, 151 ASAN or HWASAN are being applied, based on the function attributes. 152 153This does mean that users of HWASAN may need to add the new attribute 154to the code that already uses the old attribute. 155 156 157Comparison with AddressSanitizer 158================================ 159 160HWASAN: 161 * Is less portable than :doc:`AddressSanitizer` 162 as it relies on hardware `Address Tagging`_ (AArch64). 163 Address Tagging can be emulated with compiler instrumentation, 164 but it will require the instrumentation to remove the tags before 165 any load or store, which is infeasible in any realistic environment 166 that contains non-instrumented code. 167 * May have compatibility problems if the target code uses higher 168 pointer bits for other purposes. 169 * May require changes in the OS kernels (e.g. Linux seems to dislike 170 tagged pointers passed from address space: 171 https://www.kernel.org/doc/Documentation/arm64/tagged-pointers.txt). 172 * **Does not require redzones to detect buffer overflows**, 173 but the buffer overflow detection is probabilistic, with roughly 174 `1/(2**TS)` chance of missing a bug (6.25% or 0.39% with 4 and 8-bit TS 175 respectively). 176 * **Does not require quarantine to detect heap-use-after-free, 177 or stack-use-after-return**. 178 The detection is similarly probabilistic. 179 180The memory overhead of HWASAN is expected to be much smaller 181than that of AddressSanitizer: 182`1/TG` extra memory for the shadow 183and some overhead due to `TG`-aligning all objects. 184 185Supported architectures 186======================= 187HWASAN relies on `Address Tagging`_ which is only available on AArch64. 188For other 64-bit architectures it is possible to remove the address tags 189before every load and store by compiler instrumentation, but this variant 190will have limited deployability since not all of the code is 191typically instrumented. 192 193The HWASAN's approach is not applicable to 32-bit architectures. 194 195 196Related Work 197============ 198* `SPARC ADI`_ implements a similar tool mostly in hardware. 199* `Effective and Efficient Memory Protection Using Dynamic Tainting`_ discusses 200 similar approaches ("lock & key"). 201* `Watchdog`_ discussed a heavier, but still somewhat similar 202 "lock & key" approach. 203* *TODO: add more "related work" links. Suggestions are welcome.* 204 205 206.. _Watchdog: https://www.cis.upenn.edu/acg/papers/isca12_watchdog.pdf 207.. _Effective and Efficient Memory Protection Using Dynamic Tainting: https://www.cc.gatech.edu/~orso/papers/clause.doudalis.orso.prvulovic.pdf 208.. _SPARC ADI: https://lazytyped.blogspot.com/2017/09/getting-started-with-adi.html 209.. _AddressSanitizer paper: https://www.usenix.org/system/files/conference/atc12/atc12-final39.pdf 210.. _Address Tagging: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch12s05s01.html 211 212