1======================================================= 2Hardware-assisted AddressSanitizer Design Documentation 3======================================================= 4 5This page is a design document for 6**hardware-assisted AddressSanitizer** (or **HWASAN**) 7a tool similar to :doc:`AddressSanitizer`, 8but based on partial hardware assistance. 9 10 11Introduction 12============ 13 14:doc:`AddressSanitizer` 15tags every 8 bytes of the application memory with a 1 byte tag (using *shadow memory*), 16uses *redzones* to find buffer-overflows and 17*quarantine* to find use-after-free. 18The redzones, the quarantine, and, to a less extent, the shadow, are the 19sources of AddressSanitizer's memory overhead. 20See the `AddressSanitizer paper`_ for details. 21 22AArch64 has `Address Tagging`_ (or top-byte-ignore, TBI), a hardware feature that allows 23software to use the 8 most significant bits of a 64-bit pointer as 24a tag. HWASAN uses `Address Tagging`_ 25to implement a memory safety tool, similar to :doc:`AddressSanitizer`, 26but with smaller memory overhead and slightly different (mostly better) 27accuracy guarantees. 28 29Intel's `Linear Address Masking`_ (LAM) also provides address tagging for 30x86_64, though it is not widely available in hardware yet. For x86_64, HWASAN 31has a limited implementation using page aliasing instead. 32 33Algorithm 34========= 35* Every heap/stack/global memory object is forcibly aligned by `TG` bytes 36 (`TG` is e.g. 16 or 64). We call `TG` the **tagging granularity**. 37* For every such object a random `TS`-bit tag `T` is chosen (`TS`, or tag size, is e.g. 4 or 8) 38* The pointer to the object is tagged with `T`. 39* The memory for the object is also tagged with `T` (using a `TG=>1` shadow memory) 40* Every load and store is instrumented to read the memory tag and compare it 41 with the pointer tag, exception is raised on tag mismatch. 42 43For a more detailed discussion of this approach see https://arxiv.org/pdf/1802.09517.pdf 44 45Short granules 46-------------- 47 48A short granule is a granule of size between 1 and `TG-1` bytes. The size 49of a short granule is stored at the location in shadow memory where the 50granule's tag is normally stored, while the granule's actual tag is stored 51in the last byte of the granule. This means that in order to verify that a 52pointer tag matches a memory tag, HWASAN must check for two possibilities: 53 54* the pointer tag is equal to the memory tag in shadow memory, or 55* the shadow memory tag is actually a short granule size, the value being loaded 56 is in bounds of the granule and the pointer tag is equal to the last byte of 57 the granule. 58 59Pointer tags between 1 to `TG-1` are possible and are as likely as any other 60tag. This means that these tags in memory have two interpretations: the full 61tag interpretation (where the pointer tag is between 1 and `TG-1` and the 62last byte of the granule is ordinary data) and the short tag interpretation 63(where the pointer tag is stored in the granule). 64 65When HWASAN detects an error near a memory tag between 1 and `TG-1`, it 66will show both the memory tag and the last byte of the granule. Currently, 67it is up to the user to disambiguate the two possibilities. 68 69Instrumentation 70=============== 71 72Memory Accesses 73--------------- 74In the majority of cases, memory accesses are prefixed with a call to 75an outlined instruction sequence that verifies the tags. The code size 76and performance overhead of the call is reduced by using a custom calling 77convention that 78 79* preserves most registers, and 80* is specialized to the register containing the address, and the type and 81 size of the memory access. 82 83Currently, the following sequence is used: 84 85.. code-block:: none 86 87 // int foo(int *a) { return *a; } 88 // clang -O2 --target=aarch64-linux-android30 -fsanitize=hwaddress -S -o - load.c 89 [...] 90 foo: 91 stp x30, x20, [sp, #-16]! 92 adrp x20, :got:__hwasan_shadow // load shadow address from GOT into x20 93 ldr x20, [x20, :got_lo12:__hwasan_shadow] 94 bl __hwasan_check_x0_2_short_v2 // call outlined tag check 95 // (arguments: x0 = address, x20 = shadow base; 96 // "2" encodes the access type and size) 97 ldr w0, [x0] // inline load 98 ldp x30, x20, [sp], #16 99 ret 100 101 [...] 102 __hwasan_check_x0_2_short_v2: 103 sbfx x16, x0, #4, #52 // shadow offset 104 ldrb w16, [x20, x16] // load shadow tag 105 cmp x16, x0, lsr #56 // extract address tag, compare with shadow tag 106 b.ne .Ltmp0 // jump to short tag handler on mismatch 107 .Ltmp1: 108 ret 109 .Ltmp0: 110 cmp w16, #15 // is this a short tag? 111 b.hi .Ltmp2 // if not, error 112 and x17, x0, #0xf // find the address's position in the short granule 113 add x17, x17, #3 // adjust to the position of the last byte loaded 114 cmp w16, w17 // check that position is in bounds 115 b.ls .Ltmp2 // if not, error 116 orr x16, x0, #0xf // compute address of last byte of granule 117 ldrb w16, [x16] // load tag from it 118 cmp x16, x0, lsr #56 // compare with pointer tag 119 b.eq .Ltmp1 // if matches, continue 120 .Ltmp2: 121 stp x0, x1, [sp, #-256]! // save original x0, x1 on stack (they will be overwritten) 122 stp x29, x30, [sp, #232] // create frame record 123 mov x1, #2 // set x1 to a constant indicating the type of failure 124 adrp x16, :got:__hwasan_tag_mismatch_v2 // call runtime function to save remaining registers and report error 125 ldr x16, [x16, :got_lo12:__hwasan_tag_mismatch_v2] // (load address from GOT to avoid potential register clobbers in delay load handler) 126 br x16 127 128Heap 129---- 130 131Tagging the heap memory/pointers is done by `malloc`. 132This can be based on any malloc that forces all objects to be TG-aligned. 133`free` tags the memory with a different tag. 134 135Stack 136----- 137 138Stack frames are instrumented by aligning all non-promotable allocas 139by `TG` and tagging stack memory in function prologue and epilogue. 140 141Tags for different allocas in one function are **not** generated 142independently; doing that in a function with `M` allocas would require 143maintaining `M` live stack pointers, significantly increasing register 144pressure. Instead we generate a single base tag value in the prologue, 145and build the tag for alloca number `M` as `ReTag(BaseTag, M)`, where 146ReTag can be as simple as exclusive-or with constant `M`. 147 148Stack instrumentation is expected to be a major source of overhead, 149but could be optional. 150 151Globals 152------- 153 154Most globals in HWASAN instrumented code are tagged. This is accomplished 155using the following mechanisms: 156 157 * The address of each global has a static tag associated with it. The first 158 defined global in a translation unit has a pseudorandom tag associated 159 with it, based on the hash of the file path. Subsequent global tags are 160 incremental from the previously-assigned tag. 161 162 * The global's tag is added to its symbol address in the object file's symbol 163 table. This causes the global's address to be tagged when its address is 164 taken. 165 166 * When the address of a global is taken directly (i.e. not via the GOT), a special 167 instruction sequence needs to be used to add the tag to the address, 168 because the tag would otherwise take the address outside of the small code 169 model (4GB on AArch64). No changes are required when the address is taken 170 via the GOT because the address stored in the GOT will contain the tag. 171 172 * An associated ``hwasan_globals`` section is emitted for each tagged global, 173 which indicates the address of the global, its size and its tag. These 174 sections are concatenated by the linker into a single ``hwasan_globals`` 175 section that is enumerated by the runtime (via an ELF note) when a binary 176 is loaded and the memory is tagged accordingly. 177 178A complete example is given below: 179 180.. code-block:: none 181 182 // int x = 1; int *f() { return &x; } 183 // clang -O2 --target=aarch64-linux-android30 -fsanitize=hwaddress -S -o - global.c 184 185 [...] 186 f: 187 adrp x0, :pg_hi21_nc:x // set bits 12-63 to upper bits of untagged address 188 movk x0, #:prel_g3:x+0x100000000 // set bits 48-63 to tag 189 add x0, x0, :lo12:x // set bits 0-11 to lower bits of address 190 ret 191 192 [...] 193 .data 194 .Lx.hwasan: 195 .word 1 196 197 .globl x 198 .set x, .Lx.hwasan+0x2d00000000000000 199 200 [...] 201 .section .note.hwasan.globals,"aG",@note,hwasan.module_ctor,comdat 202 .Lhwasan.note: 203 .word 8 // namesz 204 .word 8 // descsz 205 .word 3 // NT_LLVM_HWASAN_GLOBALS 206 .asciz "LLVM\000\000\000" 207 .word __start_hwasan_globals-.Lhwasan.note 208 .word __stop_hwasan_globals-.Lhwasan.note 209 210 [...] 211 .section hwasan_globals,"ao",@progbits,.Lx.hwasan,unique,2 212 .Lx.hwasan.descriptor: 213 .word .Lx.hwasan-.Lx.hwasan.descriptor 214 .word 0x2d000004 // tag = 0x2d, size = 4 215 216Error reporting 217--------------- 218 219Errors are generated by the `HLT` instruction and are handled by a signal handler. 220 221Attribute 222--------- 223 224HWASAN uses its own LLVM IR Attribute `sanitize_hwaddress` and a matching 225C function attribute. An alternative would be to re-use ASAN's attribute 226`sanitize_address`. The reasons to use a separate attribute are: 227 228 * Users may need to disable ASAN but not HWASAN, or vise versa, 229 because the tools have different trade-offs and compatibility issues. 230 * LLVM (ideally) does not use flags to decide which pass is being used, 231 ASAN or HWASAN are being applied, based on the function attributes. 232 233This does mean that users of HWASAN may need to add the new attribute 234to the code that already uses the old attribute. 235 236 237Comparison with AddressSanitizer 238================================ 239 240HWASAN: 241 * Is less portable than :doc:`AddressSanitizer` 242 as it relies on hardware `Address Tagging`_ (AArch64). 243 Address Tagging can be emulated with compiler instrumentation, 244 but it will require the instrumentation to remove the tags before 245 any load or store, which is infeasible in any realistic environment 246 that contains non-instrumented code. 247 * May have compatibility problems if the target code uses higher 248 pointer bits for other purposes. 249 * May require changes in the OS kernels (e.g. Linux seems to dislike 250 tagged pointers passed from address space: 251 https://www.kernel.org/doc/Documentation/arm64/tagged-pointers.txt). 252 * **Does not require redzones to detect buffer overflows**, 253 but the buffer overflow detection is probabilistic, with roughly 254 `1/(2**TS)` chance of missing a bug (6.25% or 0.39% with 4 and 8-bit TS 255 respectively). 256 * **Does not require quarantine to detect heap-use-after-free, 257 or stack-use-after-return**. 258 The detection is similarly probabilistic. 259 260The memory overhead of HWASAN is expected to be much smaller 261than that of AddressSanitizer: 262`1/TG` extra memory for the shadow 263and some overhead due to `TG`-aligning all objects. 264 265Supported architectures 266======================= 267HWASAN relies on `Address Tagging`_ which is only available on AArch64. 268For other 64-bit architectures it is possible to remove the address tags 269before every load and store by compiler instrumentation, but this variant 270will have limited deployability since not all of the code is 271typically instrumented. 272 273On x86_64, HWASAN utilizes page aliasing to place tags in userspace address 274bits. Currently only heap tagging is supported. The page aliases rely on 275shared memory, which will cause heap memory to be shared between processes if 276the application calls ``fork()``. Therefore x86_64 is really only safe for 277applications that do not fork. 278 279HWASAN does not currently support 32-bit architectures since they do not 280support `Address Tagging`_ and the address space is too constrained to easily 281implement page aliasing. 282 283 284Related Work 285============ 286* `SPARC ADI`_ implements a similar tool mostly in hardware. 287* `Effective and Efficient Memory Protection Using Dynamic Tainting`_ discusses 288 similar approaches ("lock & key"). 289* `Watchdog`_ discussed a heavier, but still somewhat similar 290 "lock & key" approach. 291* *TODO: add more "related work" links. Suggestions are welcome.* 292 293 294.. _Watchdog: https://www.cis.upenn.edu/acg/papers/isca12_watchdog.pdf 295.. _Effective and Efficient Memory Protection Using Dynamic Tainting: https://www.cc.gatech.edu/~orso/papers/clause.doudalis.orso.prvulovic.pdf 296.. _SPARC ADI: https://lazytyped.blogspot.com/2017/09/getting-started-with-adi.html 297.. _AddressSanitizer paper: https://www.usenix.org/system/files/conference/atc12/atc12-final39.pdf 298.. _Address Tagging: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch12s05s01.html 299.. _Linear Address Masking: https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html 300