1Detecting ARM64 tagged pointers 2=============================== 3 4The ARM64 ABI allows tagged memory addresses to be passed through the 5user-kernel syscall ABI boundary. Tagged memory addresses are those which 6contain a non-zero top byte - the hardware will always ignore this top 7byte, however software does not. Therefore it is helpful to be able to 8detect code that erroneously compares tagged memory addresses with 9untagged memory addresses. This document describes how smatch can be used 10for this. 11 12Smatch will provide a warning when it detects that a comparison is being 13made between a user originated 64 bit data where the top byte may be 14non-zero and any variable which may contain an untagged address. 15 16Untagged variables are detected by looking for hard-coded known struct 17members (such as vm_start, vm_end and addr_limit) and hard-coded known 18macros (such as PAGE_SIZE, PAGE_MASK and TASK_SIZE). This check is 19also able to detect when comparisons are made against variables that 20have been assigned from these known untagged variables, though this 21tracking is limited to the scope of the function. 22 23This check is only performed when the ARCH environment variable is set to 24arm64. To provide a worked example, consider the following command which is 25used to perform Smatch static analysis on the Linux kernel: 26 27$ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- ~/smatch/smatch_scripts/build_kernel_data.sh 28 29It is recommended that this command is run multiple times (6 or more) to 30provide Smatch with a deeper knowledge of the call stack. Before running 31multiple iterations of Smatch, it may be beneficial to delete any smatch* 32files in the root of the linux tree. 33 34Once Smatch has run, you can observe warnings as follows: 35 36$ cat smatch_warns.txt | grep "tagged address" 37mm/gup.c:818 __get_user_pages() warn: comparison of a potentially tagged 38address (__get_user_pages, 2, start) 39... 40 41This warning tells us that on line 818 of mm/gup.c an erroneous comparison 42may have been made between a tagged address (variable 'start' which originated 43from parameter 2 of the function) and existing kernel addresses (untagged). 44 45The code that this relates to follows: 46 47790: static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, 48791: unsigned long start, unsigned long nr_pages, 49792: unsigned int gup_flags, struct page **pages, 50793: struct vm_area_struct **vmas, int *nonblocking) 51794:{ 52... 53818: if (!vma || start >= vma->vm_end) { 54 55Through manual inspection of this code, we can verify that the variable 'start' 56originated from parameter 2 of its function '__get_user_pages'. 57 58A suggested fix at this point may be to call the untagged_addr macro prior 59to the comparison on line 818. However it's often helpful to follow the 60parameter up the call stack, we can do this with the following Smatch command: 61 62$ ~/smatch/smatch_data/db/smdb.py find_tagged __get_user_pages 2 63 copy_strings (param ?) -> get_arg_page (param 1) 64 vfio_pin_map_dma (param ?) -> vfio_pin_pages_remote (param 1) 65 __se_sys_ptrace (param 2) 66 environ_read (param ?) -> access_remote_vm (param 1) 67 io_sqe_buffer_register (param ?) -> get_user_pages (param 0) 68 gntdev_grant_copy_seg (param ?) -> gntdev_get_page (param 1) 69 get_futex_key (param ?) -> get_user_pages_fast (param 0) 70 __se_sys_madvise (param 0) 71 __mm_populate (param ?) -> populate_vma_page_range (param 1) 72 __se_sys_mprotect (param 0) 73 74 75This script will examine all of the possible callers of __get_user_pages where 76parameter 2 contains user data and where the top byte of the parameter may be 77non-zero. It will recurse up the possible call stacks as far as it can go. This 78will leave a list of functions that provide tagged addresses to __get_user_pages 79and the parameter of interest (or variable if Smatch cannot determine the 80function parameter). 81 82Sometimes Smatch is able to determine a caller of a function but is unable 83to determine which parameter of that function relates to the parameter of the 84called function, when this happens the following output it shown: 85 86get_futex_key (param ?) -> get_user_pages_fast (param 0) 87 88This shows that when following up the call tree from __get_user_pages, we stop 89at get_user_pages_fast with parameter 0 of that function containing user data. 90Smatch knows that get_futex_key calls get_user_pages_fast but cannot determine 91which parameter of get_futex_key provided the data of interest. In these cases 92manual inspection of the source tree can help and if necessary re-run the 93smdb.py script with new parameters (e.g. smdb.py find_tagged get_futex_key 0). 94 95To provide a summary of all of the tagged issues found, the following command 96can be run directly on the smatch_warns.txt file: 97 98$ ~/smatch/smatch_data/db/smdb.py parse_warns_tagged smatch_warns.txt 99 100This will run find_tagged for each issue found, e.g. 101 102mm/mmap.c:2918 (func: __do_sys_remap_file_pages, param: 0:start) may be caused by: 103 __se_sys_remap_file_pages (param 0) 104 105mm/mmap.c:2963 (func: __do_sys_remap_file_pages, param: -1:__UNIQUE_ID___y73) may be caused by: 106 __do_sys_remap_file_pages (variable __UNIQUE_ID___y73 (can't walk call tree) 107 108mm/mmap.c:3000 (func: do_brk_flags, param: -1:error) may be caused by: 109 do_brk_flags (variable error (can't walk call tree) 110 111mm/mmap.c:540 (func: find_vma_links, param: 1:addr) may be caused by: 112 find_vma_links (param 1) (can't walk call tree) 113 114mm/mmap.c:570 (func: count_vma_pages_range, param: -1:__UNIQUE_ID___x64) may be caused by: 115 count_vma_pages_range (variable __UNIQUE_ID___x64 (can't walk call tree) 116 117mm/mmap.c:580 (func: count_vma_pages_range, param: -1:__UNIQUE_ID___x68) may be caused by: 118 count_vma_pages_range (variable __UNIQUE_ID___x68 (can't walk call tree) 119 120mm/mmap.c:856 (func: __vma_adjust, param: 1:start) may be caused by: 121 __se_sys_mprotect (param 0) 122 __se_sys_mlock (param 0) 123 __se_sys_mlock2 (param 0) 124 __se_sys_munlock (param 0) 125 mbind_range (param ?) -> vma_merge (param 2) 126 __se_sys_madvise (param 0) 127 __se_sys_mbind (param 0) 128 129 130The above commands do not output a call stack, instead they provide the 'highest' 131caller found, to provide a call stack perform the following: 132 133$ ~/smatch/smatch_data/db/smdb.py call_tree __get_user_pages 134__get_user_pages() 135 __get_user_pages_locked() 136 get_user_pages_remote() 137 get_arg_page() 138 copy_strings() 139 remove_arg_zero() 140 vaddr_get_pfn() 141 vfio_pin_pages_remote() 142 vfio_pin_page_external() 143 process_vm_rw_single_vec() 144 process_vm_rw_core() 145 __access_remote_vm() 146 ptrace_access_vm() 147 access_remote_vm() 148 access_process_vm() 149 check_and_migrate_cma_pages() 150 __gup_longterm_locked() 151 get_user_pages() 152 __gup_longterm_unlocked() 153 get_user_pages_locked() 154 get_vaddr_frames() 155 vb2_create_framevec() 156 lookup_node() 157 do_get_mempolicy() 158 get_user_pages_unlocked() 159 hva_to_pfn_slow() 160 hva_to_pfn() 161 162Please note that this will show all the callers and is not filtered for those 163carrying tagged addresses in their parameters. 164 165It is possible to filter out false positives by annotating function parameters 166with __untagged. For example: 167 168unsigned long do_mmap(struct file *file, unsigned long addr, 169 unsigned long __untagged len, unsigned long prot, 170 unsigned long flags, vm_flags_t vm_flags, 171 unsigned long pgoff, unsigned long *populate, 172 struct list_head *uf) 173{ 174 175This annotation tells smatch that regardless to the value stored in 'len' it 176should be treated as an untagged address. As Smatch is able to track the 177potential ranges of values a variable may hold, it will also track the 178annotation - therefore it is not necessary to use the annotation in every 179function that do_mmap calls. When using this annotation smdb.py will filter 180out functions that carry a value which has been annotated as untagged. Please 181note that due to limitations in parameter tracking some annotations will be 182ignored and not propogated all the way down the call tree. 183 184Finally, the following patch is required to add annotations to the Linux 185kernel: 186 187diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h 188index 19e58b9138a0..755e8df375a5 100644 189--- a/include/linux/compiler_types.h 190+++ b/include/linux/compiler_types.h 191@@ -19,6 +19,7 @@ 192 # define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0) 193 # define __percpu __attribute__((noderef, address_space(3))) 194 # define __rcu __attribute__((noderef, address_space(4))) 195+# define __untagged __attribute__((address_space(5))) 196 # define __private __attribute__((noderef)) 197 extern void __chk_user_ptr(const volatile void __user *); 198 extern void __chk_io_ptr(const volatile void __iomem *); 199@@ -45,6 +46,7 @@ extern void __chk_io_ptr(const volatile void __iomem *); 200 # define __cond_lock(x,c) (c) 201 # define __percpu 202 # define __rcu 203+# define __untagged 204 # define __private 205 # define ACCESS_PRIVATE(p, member) ((p)->member) 206 #endif /* __CHECKER__ */ 207 208