#
ff474a78 |
| 11-Jun-2024 |
Jiri Olsa <jolsa@kernel.org> |
uprobe: Add uretprobe syscall to speed up return probe
Adding uretprobe syscall instead of trap to speed up return probe.
At the moment the uretprobe setup/path is:
- install entry uprobe
- w
uprobe: Add uretprobe syscall to speed up return probe
Adding uretprobe syscall instead of trap to speed up return probe.
At the moment the uretprobe setup/path is:
- install entry uprobe
- when the uprobe is hit, it overwrites probed function's return address on stack with address of the trampoline that contains breakpoint instruction
- the breakpoint trap code handles the uretprobe consumers execution and jumps back to original return address
This patch replaces the above trampoline's breakpoint instruction with new ureprobe syscall call. This syscall does exactly the same job as the trap with some more extra work:
- syscall trampoline must save original value for rax/r11/rcx registers on stack - rax is set to syscall number and r11/rcx are changed and used by syscall instruction
- the syscall code reads the original values of those registers and restore those values in task's pt_regs area
- only caller from trampoline exposed in '[uprobes]' is allowed, the process will receive SIGILL signal otherwise
Even with some extra work, using the uretprobes syscall shows speed improvement (compared to using standard breakpoint):
On Intel (11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz)
current: uretprobe-nop : 1.498 ± 0.000M/s uretprobe-push : 1.448 ± 0.001M/s uretprobe-ret : 0.816 ± 0.001M/s
with the fix: uretprobe-nop : 1.969 ± 0.002M/s < 31% speed up uretprobe-push : 1.910 ± 0.000M/s < 31% speed up uretprobe-ret : 0.934 ± 0.000M/s < 14% speed up
On Amd (AMD Ryzen 7 5700U)
current: uretprobe-nop : 0.778 ± 0.001M/s uretprobe-push : 0.744 ± 0.001M/s uretprobe-ret : 0.540 ± 0.001M/s
with the fix: uretprobe-nop : 0.860 ± 0.001M/s < 10% speed up uretprobe-push : 0.818 ± 0.001M/s < 10% speed up uretprobe-ret : 0.578 ± 0.000M/s < 7% speed up
The performance test spawns a thread that runs loop which triggers uprobe with attached bpf program that increments the counter that gets printed in results above.
The uprobe (and uretprobe) kind is determined by which instruction is being patched with breakpoint instruction. That's also important for uretprobes, because uprobe is installed for each uretprobe.
The performance test is part of bpf selftests: tools/testing/selftests/bpf/run_bench_uprobes.sh
Note at the moment uretprobe syscall is supported only for native 64-bit process, compat process still uses standard breakpoint.
Note that when shadow stack is enabled the uretprobe syscall returns via iret, which is slower than return via sysret, but won't cause the shadow stack violation.
Link: https://lore.kernel.org/all/20240611112158.40795-4-jolsa@kernel.org/
Suggested-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
show more ...
|
#
1713b63a |
| 11-Jun-2024 |
Jiri Olsa <jolsa@kernel.org> |
x86/shstk: Make return uprobe work with shadow stack
Currently the application with enabled shadow stack will crash if it sets up return uprobe. The reason is the uretprobe kernel code changes the u
x86/shstk: Make return uprobe work with shadow stack
Currently the application with enabled shadow stack will crash if it sets up return uprobe. The reason is the uretprobe kernel code changes the user space task's stack, but does not update shadow stack accordingly.
Adding new functions to update values on shadow stack and using them in uprobe code to keep shadow stack in sync with uretprobe changes to user stack.
Link: https://lore.kernel.org/all/20240611112158.40795-2-jolsa@kernel.org/
Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Fixes: 488af8ea7131 ("x86/shstk: Wire in shadow stack interface") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
show more ...
|
#
67840ad0 |
| 13-Jun-2023 |
Rick Edgecombe <rick.p.edgecombe@intel.com> |
x86/shstk: Add ARCH_SHSTK_STATUS
CRIU and GDB need to get the current shadow stack and WRSS enablement status. This information is already available via /proc/pid/status, but this is inconvenient fo
x86/shstk: Add ARCH_SHSTK_STATUS
CRIU and GDB need to get the current shadow stack and WRSS enablement status. This information is already available via /proc/pid/status, but this is inconvenient for CRIU because it involves parsing the text output in an area of the code where this is difficult. Provide a status arch_prctl(), ARCH_SHSTK_STATUS for retrieving the status. Have arg2 be a userspace address, and make the new arch_prctl simply copy the features out to userspace.
Suggested-by: Mike Rapoport <rppt@kernel.org> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: John Allen <john.allen@amd.com> Tested-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/all/20230613001108.3040476-43-rick.p.edgecombe%40intel.com
show more ...
|
#
05e36022 |
| 13-Jun-2023 |
Rick Edgecombe <rick.p.edgecombe@intel.com> |
x86/shstk: Handle signals for shadow stack
When a signal is handled, the context is pushed to the stack before handling it. For shadow stacks, since the shadow stack only tracks return addresses, th
x86/shstk: Handle signals for shadow stack
When a signal is handled, the context is pushed to the stack before handling it. For shadow stacks, since the shadow stack only tracks return addresses, there isn't any state that needs to be pushed. However, there are still a few things that need to be done. These things are visible to userspace and which will be kernel ABI for shadow stacks.
One is to make sure the restorer address is written to shadow stack, since the signal handler (if not changing ucontext) returns to the restorer, and the restorer calls sigreturn. So add the restorer on the shadow stack before handling the signal, so there is not a conflict when the signal handler returns to the restorer.
The other thing to do is to place some type of checkable token on the thread's shadow stack before handling the signal and check it during sigreturn. This is an extra layer of protection to hamper attackers calling sigreturn manually as in SROP-like attacks.
For this token the shadow stack data format defined earlier can be used. Have the data pushed be the previous SSP. In the future the sigreturn might want to return back to a different stack. Storing the SSP (instead of a restore offset or something) allows for future functionality that may want to restore to a different stack.
So, when handling a signal push - the SSP pointing in the shadow stack data format - the restorer address below the restore token.
In sigreturn, verify SSP is stored in the data format and pop the shadow stack.
Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: John Allen <john.allen@amd.com> Tested-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/all/20230613001108.3040476-32-rick.p.edgecombe%40intel.com
show more ...
|
#
b2926a36 |
| 13-Jun-2023 |
Rick Edgecombe <rick.p.edgecombe@intel.com> |
x86/shstk: Handle thread shadow stack
When a process is duplicated, but the child shares the address space with the parent, there is potential for the threads sharing a single stack to cause conflic
x86/shstk: Handle thread shadow stack
When a process is duplicated, but the child shares the address space with the parent, there is potential for the threads sharing a single stack to cause conflicts for each other. In the normal non-CET case this is handled in two ways.
With regular CLONE_VM a new stack is provided by userspace such that the parent and child have different stacks.
For vfork, the parent is suspended until the child exits. So as long as the child doesn't return from the vfork()/CLONE_VFORK calling function and sticks to a limited set of operations, the parent and child can share the same stack.
For shadow stack, these scenarios present similar sharing problems. For the CLONE_VM case, the child and the parent must have separate shadow stacks. Instead of changing clone to take a shadow stack, have the kernel just allocate one and switch to it.
Use stack_size passed from clone3() syscall for thread shadow stack size. A compat-mode thread shadow stack size is further reduced to 1/4. This allows more threads to run in a 32-bit address space. The clone() does not pass stack_size, which was added to clone3(). In that case, use RLIMIT_STACK size and cap to 4 GB.
For shadow stack enabled vfork(), the parent and child can share the same shadow stack, like they can share a normal stack. Since the parent is suspended until the child terminates, the child will not interfere with the parent while executing as long as it doesn't return from the vfork() and overwrite up the shadow stack. The child can safely overwrite down the shadow stack, as the parent can just overwrite this later. So CET does not add any additional limitations for vfork().
Free the shadow stack on thread exit by doing it in mm_release(). Skip this when exiting a vfork() child since the stack is shared in the parent.
During this operation, the shadow stack pointer of the new thread needs to be updated to point to the newly allocated shadow stack. Since the ability to do this is confined to the FPU subsystem, change fpu_clone() to take the new shadow stack pointer, and update it internally inside the FPU subsystem. This part was suggested by Thomas Gleixner.
Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: John Allen <john.allen@amd.com> Tested-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/all/20230613001108.3040476-30-rick.p.edgecombe%40intel.com
show more ...
|
#
2d39a6ad |
| 13-Jun-2023 |
Rick Edgecombe <rick.p.edgecombe@intel.com> |
x86/shstk: Add user-mode shadow stack support
Introduce basic shadow stack enabling/disabling/allocation routines. A task's shadow stack is allocated from memory with VM_SHADOW_STACK flag and has a
x86/shstk: Add user-mode shadow stack support
Introduce basic shadow stack enabling/disabling/allocation routines. A task's shadow stack is allocated from memory with VM_SHADOW_STACK flag and has a fixed size of min(RLIMIT_STACK, 4GB).
Keep the task's shadow stack address and size in thread_struct. This will be copied when cloning new threads, but needs to be cleared during exec, so add a function to do this.
32 bit shadow stack is not expected to have many users and it will complicate the signal implementation. So do not support IA32 emulation or x32.
Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: John Allen <john.allen@amd.com> Tested-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/all/20230613001108.3040476-29-rick.p.edgecombe%40intel.com
show more ...
|
#
98cfa463 |
| 13-Jun-2023 |
Rick Edgecombe <rick.p.edgecombe@intel.com> |
x86: Introduce userspace API for shadow stack
Add three new arch_prctl() handles:
- ARCH_SHSTK_ENABLE/DISABLE enables or disables the specified feature. Returns 0 on success or a negative value
x86: Introduce userspace API for shadow stack
Add three new arch_prctl() handles:
- ARCH_SHSTK_ENABLE/DISABLE enables or disables the specified feature. Returns 0 on success or a negative value on error.
- ARCH_SHSTK_LOCK prevents future disabling or enabling of the specified feature. Returns 0 on success or a negative value on error.
The features are handled per-thread and inherited over fork(2)/clone(2), but reset on exec().
Co-developed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: John Allen <john.allen@amd.com> Tested-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/all/20230613001108.3040476-27-rick.p.edgecombe%40intel.com
show more ...
|