#
fef44eba |
| 20-Sep-2023 |
Colin Ian King <colin.i.king@gmail.com> |
x86/unwind/orc: Remove redundant initialization of 'mid' pointer in __orc_find()
The 'mid' pointer is being initialized with a value that is never read, it is being re-assigned and used inside a for
x86/unwind/orc: Remove redundant initialization of 'mid' pointer in __orc_find()
The 'mid' pointer is being initialized with a value that is never read, it is being re-assigned and used inside a for-loop. Remove the redundant initialization.
Cleans up clang scan build warning:
arch/x86/kernel/unwind_orc.c:88:7: warning: Value stored to 'mid' during its initialization is never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20230920114141.118919-1-colin.i.king@gmail.com
show more ...
|
#
b9f174c8 |
| 13-Jun-2023 |
Omar Sandoval <osandov@fb.com> |
x86/unwind/orc: Add ELF section with ORC version identifier
Commits ffb1b4a41016 ("x86/unwind/orc: Add 'signal' field to ORC metadata") and fb799447ae29 ("x86,objtool: Split UNWIND_HINT_EMPTY in two
x86/unwind/orc: Add ELF section with ORC version identifier
Commits ffb1b4a41016 ("x86/unwind/orc: Add 'signal' field to ORC metadata") and fb799447ae29 ("x86,objtool: Split UNWIND_HINT_EMPTY in two") changed the ORC format. Although ORC is internal to the kernel, it's the only way for external tools to get reliable kernel stack traces on x86-64. In particular, the drgn debugger [1] uses ORC for stack unwinding, and these format changes broke it [2]. As the drgn maintainer, I don't care how often or how much the kernel changes the ORC format as long as I have a way to detect the change.
It suffices to store a version identifier in the vmlinux and kernel module ELF files (to use when parsing ORC sections from ELF), and in kernel memory (to use when parsing ORC from a core dump+symbol table). Rather than hard-coding a version number that needs to be manually bumped, Peterz suggested hashing the definitions from orc_types.h. If there is a format change that isn't caught by this, the hashing script can be updated.
This patch adds an .orc_header allocated ELF section containing the 20-byte hash to vmlinux and kernel modules, along with the corresponding __start_orc_header and __stop_orc_header symbols in vmlinux.
1: https://github.com/osandov/drgn 2: https://github.com/osandov/drgn/issues/303
Fixes: ffb1b4a41016 ("x86/unwind/orc: Add 'signal' field to ORC metadata") Fixes: fb799447ae29 ("x86,objtool: Split UNWIND_HINT_EMPTY in two") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lkml.kernel.org/r/aef9c8dc43915b886a8c48509a12ec1b006ca1ca.1686690801.git.osandov@osandov.com
show more ...
|
#
02012623 |
| 16-May-2023 |
Josh Poimboeuf <jpoimboe@kernel.org> |
Revert "x86/orc: Make it callthunk aware"
Commit 396e0b8e09e8 ("x86/orc: Make it callthunk aware") attempted to deal with the fact that function prefix code didn't have ORC coverage. However, it did
Revert "x86/orc: Make it callthunk aware"
Commit 396e0b8e09e8 ("x86/orc: Make it callthunk aware") attempted to deal with the fact that function prefix code didn't have ORC coverage. However, it didn't work as advertised. Use of the "null" ORC entry just caused affected unwinds to end early.
The root cause has now been fixed with commit 5743654f5e2e ("objtool: Generate ORC data for __pfx code").
Revert most of commit 396e0b8e09e8 ("x86/orc: Make it callthunk aware"). The is_callthunk() function remains as it's now used by other code.
Link: https://lore.kernel.org/r/a05b916ef941da872cbece1ab3593eceabd05a79.1684245404.git.jpoimboe@kernel.org Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
show more ...
|
#
89da5a69 |
| 12-Apr-2023 |
Josh Poimboeuf <jpoimboe@redhat.com> |
x86/unwind/orc: Add 'unwind_debug' cmdline option
Sometimes the one-line ORC unwinder warnings aren't very helpful. Add a new 'unwind_debug' cmdline option which will dump the full stack contents o
x86/unwind/orc: Add 'unwind_debug' cmdline option
Sometimes the one-line ORC unwinder warnings aren't very helpful. Add a new 'unwind_debug' cmdline option which will dump the full stack contents of the current task when an error condition is encountered.
Reviewed-by: Miroslav Benes <mbenes@suse.cz> Link: https://lore.kernel.org/r/6afb9e48a05fd2046bfad47e69b061b43dfd0e0e.1681331449.git.jpoimboe@kernel.org Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
show more ...
|
#
95f0e3a2 |
| 30-Mar-2023 |
Jiapeng Chong <jiapeng.chong@linux.alibaba.com> |
x86/unwind/orc: Use swap() instead of open coding it
Swap is a function interface that provides exchange function. To avoid code duplication, we can use swap function.
./arch/x86/kernel/unwind_orc.
x86/unwind/orc: Use swap() instead of open coding it
Swap is a function interface that provides exchange function. To avoid code duplication, we can use swap function.
./arch/x86/kernel/unwind_orc.c:235:16-17: WARNING opportunity for swap().
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4641 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/20230330020014.40489-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
show more ...
|
#
fb799447 |
| 01-Mar-2023 |
Josh Poimboeuf <jpoimboe@kernel.org> |
x86,objtool: Split UNWIND_HINT_EMPTY in two
Mark reported that the ORC unwinder incorrectly marks an unwind as reliable when the unwind terminates prematurely in the dark corners of return_to_handle
x86,objtool: Split UNWIND_HINT_EMPTY in two
Mark reported that the ORC unwinder incorrectly marks an unwind as reliable when the unwind terminates prematurely in the dark corners of return_to_handler() due to lack of information about the next frame.
The problem is UNWIND_HINT_EMPTY is used in two different situations:
1) The end of the kernel stack unwind before hitting user entry, boot code, or fork entry
2) A blind spot in ORC coverage where the unwinder has to bail due to lack of information about the next frame
The ORC unwinder has no way to tell the difference between the two. When it encounters an undefined stack state with 'end=1', it blindly marks the stack reliable, which can break the livepatch consistency model.
Fix it by splitting UNWIND_HINT_EMPTY into UNWIND_HINT_UNDEFINED and UNWIND_HINT_END_OF_STACK.
Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/fd6212c8b450d3564b855e1cb48404d6277b4d9f.1677683419.git.jpoimboe@kernel.org
show more ...
|
#
f902cfdd |
| 01-Mar-2023 |
Josh Poimboeuf <jpoimboe@kernel.org> |
x86,objtool: Introduce ORC_TYPE_*
Unwind hints and ORC entry types are two distinct things. Separate them out more explicitly.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Pe
x86,objtool: Introduce ORC_TYPE_*
Unwind hints and ORC entry types are two distinct things. Separate them out more explicitly.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/cc879d38fff8a43f8f7beb2fd56e35a5a384d7cd.1677683419.git.jpoimboe@kernel.org
show more ...
|
#
ffb1b4a4 |
| 10-Feb-2023 |
Josh Poimboeuf <jpoimboe@kernel.org> |
x86/unwind/orc: Add 'signal' field to ORC metadata
Add a 'signal' field which allows unwind hints to specify whether the instruction pointer should be taken literally (like for most interrupts and e
x86/unwind/orc: Add 'signal' field to ORC metadata
Add a 'signal' field which allows unwind hints to specify whether the instruction pointer should be taken literally (like for most interrupts and exceptions) rather than decremented (like for call stack return addresses) when used to find the next ORC entry.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/d2c5ec4d83a45b513d8fd72fab59f1a8cfa46871.1676068346.git.jpoimboe@kernel.org
show more ...
|
#
230db824 |
| 27-Jul-2022 |
Chen Zhongjin <chenzhongjin@huawei.com> |
x86/unwind/orc: Fix unreliable stack dump with gcov
When a console stack dump is initiated with CONFIG_GCOV_PROFILE_ALL enabled, show_trace_log_lvl() gets out of sync with the ORC unwinder, causing
x86/unwind/orc: Fix unreliable stack dump with gcov
When a console stack dump is initiated with CONFIG_GCOV_PROFILE_ALL enabled, show_trace_log_lvl() gets out of sync with the ORC unwinder, causing the stack trace to show all text addresses as unreliable:
# echo l > /proc/sysrq-trigger [ 477.521031] sysrq: Show backtrace of all active CPUs [ 477.523813] NMI backtrace for cpu 0 [ 477.524492] CPU: 0 PID: 1021 Comm: bash Not tainted 6.0.0 #65 [ 477.525295] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014 [ 477.526439] Call Trace: [ 477.526854] <TASK> [ 477.527216] ? dump_stack_lvl+0xc7/0x114 [ 477.527801] ? dump_stack+0x13/0x1f [ 477.528331] ? nmi_cpu_backtrace.cold+0xb5/0x10d [ 477.528998] ? lapic_can_unplug_cpu+0xa0/0xa0 [ 477.529641] ? nmi_trigger_cpumask_backtrace+0x16a/0x1f0 [ 477.530393] ? arch_trigger_cpumask_backtrace+0x1d/0x30 [ 477.531136] ? sysrq_handle_showallcpus+0x1b/0x30 [ 477.531818] ? __handle_sysrq.cold+0x4e/0x1ae [ 477.532451] ? write_sysrq_trigger+0x63/0x80 [ 477.533080] ? proc_reg_write+0x92/0x110 [ 477.533663] ? vfs_write+0x174/0x530 [ 477.534265] ? handle_mm_fault+0x16f/0x500 [ 477.534940] ? ksys_write+0x7b/0x170 [ 477.535543] ? __x64_sys_write+0x1d/0x30 [ 477.536191] ? do_syscall_64+0x6b/0x100 [ 477.536809] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 477.537609] </TASK>
This happens when the compiled code for show_stack() has a single word on the stack, and doesn't use a tail call to show_stack_log_lvl(). (CONFIG_GCOV_PROFILE_ALL=y is the only known case of this.) Then the __unwind_start() skip logic hits an off-by-one bug and fails to unwind all the way to the intended starting frame.
Fix it by reverting the following commit:
f1d9a2abff66 ("x86/unwind/orc: Don't skip the first frame for inactive tasks")
The original justification for that commit no longer exists. That original issue was later fixed in a different way, with the following commit:
f2ac57a4c49d ("x86/unwind/orc: Fix inactive tasks with stack pointer in %sp on GCC 10 compiled kernels")
Fixes: f1d9a2abff66 ("x86/unwind/orc: Don't skip the first frame for inactive tasks") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> [jpoimboe: rewrite commit log] Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
show more ...
|
#
396e0b8e |
| 15-Sep-2022 |
Peter Zijlstra <peterz@infradead.org> |
x86/orc: Make it callthunk aware
Callthunks addresses on the stack would confuse the ORC unwinder. Handle them correctly and tell ORC to proceed further down the stack.
Signed-off-by: Peter Zijlstr
x86/orc: Make it callthunk aware
Callthunks addresses on the stack would confuse the ORC unwinder. Handle them correctly and tell ORC to proceed further down the stack.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111148.511637628@infradead.org
show more ...
|
#
fc2e426b |
| 19-Aug-2022 |
Chen Zhongjin <chenzhongjin@huawei.com> |
x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry
When meeting ftrace trampolines in ORC unwinding, unwinder uses address of ftrace_{regs_}call address to find the ORC entry, which ge
x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry
When meeting ftrace trampolines in ORC unwinding, unwinder uses address of ftrace_{regs_}call address to find the ORC entry, which gets next frame at sp+176.
If there is an IRQ hitting at sub $0xa8,%rsp, the next frame should be sp+8 instead of 176. It makes unwinder skip correct frame and throw warnings such as "wrong direction" or "can't access registers", etc, depending on the content of the incorrect frame address.
By adding the base address ftrace_{regs_}caller with the offset *ip - ops->trampoline*, we can get the correct address to find the ORC entry.
Also change "caller" to "tramp_addr" to make variable name conform to its content.
[ mingo: Clarified the changelog a bit. ]
Fixes: 6be7fa3c74d1 ("ftrace, orc, x86: Handle ftrace dynamically allocated trampolines") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220819084334.244016-1-chenzhongjin@huawei.com
show more ...
|
#
6c8ef58a |
| 19-Apr-2022 |
Dmitry Monakhov <dmtrmonakhov@yandex-team.ru> |
x86/unwind/orc: Recheck address range after stack info was updated
A crash was observed in the ORC unwinder:
BUG: stack guard page was hit at 000000000dd984a2 (stack is 00000000d1caafca..00000000
x86/unwind/orc: Recheck address range after stack info was updated
A crash was observed in the ORC unwinder:
BUG: stack guard page was hit at 000000000dd984a2 (stack is 00000000d1caafca..00000000613712f0) kernel stack overflow (page fault): 0000 [#1] SMP NOPTI CPU: 93 PID: 23787 Comm: context_switch1 Not tainted 5.4.145 #1 RIP: 0010:unwind_next_frame Call Trace: <NMI> perf_callchain_kernel get_perf_callchain perf_callchain perf_prepare_sample perf_event_output_forward __perf_event_overflow perf_ibs_handle_irq perf_ibs_nmi_handler nmi_handle default_do_nmi do_nmi end_repeat_nmi
This was really two bugs:
1) The perf IBS code passed inconsistent regs to the unwinder.
2) The unwinder didn't handle the bad input gracefully.
Fix the latter bug. The ORC unwinder needs to be immune against bad inputs. The problem is that stack_access_ok() doesn't recheck the validity of the full range of registers after switching to the next valid stack with get_stack_info(). Fix that.
[ jpoimboe: rewrote commit log ]
Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/1650353656-956624-1-git-send-email-dmtrmonakhov@yandex-team.ru Signed-off-by: Peter Zijlstra <peterz@infradead.org>
show more ...
|
#
f3a112c0 |
| 26-Mar-2022 |
Masami Hiramatsu <mhiramat@kernel.org> |
x86,rethook,kprobes: Replace kretprobe with rethook on x86
Replaces the kretprobe code with rethook on x86. With this patch, kretprobe on x86 uses the rethook instead of kretprobe specific trampolin
x86,rethook,kprobes: Replace kretprobe with rethook on x86
Replaces the kretprobe code with rethook on x86. With this patch, kretprobe on x86 uses the rethook instead of kretprobe specific trampoline code.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/164826163692.2455864.13745421016848209527.stgit@devnote2
show more ...
|
#
b9ad8fe7 |
| 09-Nov-2021 |
Kefeng Wang <wangkefeng.wang@huawei.com> |
sections: move is_kernel_inittext() into sections.h
The is_kernel_inittext() and init_kernel_text() are with same functionality, let's just keep is_kernel_inittext() and move it into sections.h, the
sections: move is_kernel_inittext() into sections.h
The is_kernel_inittext() and init_kernel_text() are with same functionality, let's just keep is_kernel_inittext() and move it into sections.h, then update all the callers.
Link: https://lkml.kernel.org/r/20210930071143.63410-5-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Alexander Potapenko <glider@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mackerras <paulus@samba.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
19138af1 |
| 14-Sep-2021 |
Masami Hiramatsu <mhiramat@kernel.org> |
x86/unwind: Recover kretprobe trampoline entry
Since the kretprobe replaces the function return address with the kretprobe_trampoline on the stack, x86 unwinders can not continue the stack unwinding
x86/unwind: Recover kretprobe trampoline entry
Since the kretprobe replaces the function return address with the kretprobe_trampoline on the stack, x86 unwinders can not continue the stack unwinding at that point, or record kretprobe_trampoline instead of correct return address.
To fix this issue, find the correct return address from task's kretprobe_instances as like as function-graph tracer does.
With this fix, the unwinder can correctly unwind the stack from kretprobe event on x86, as below.
<...>-135 [003] ...1 6.722338: r_full_proxy_read_0: (vfs_read+0xab/0x1a0 <- full_proxy_read) <...>-135 [003] ...1 6.722377: <stack trace> => kretprobe_trace_func+0x209/0x2f0 => kretprobe_dispatcher+0x4a/0x70 => __kretprobe_trampoline_handler+0xca/0x150 => trampoline_handler+0x44/0x70 => kretprobe_trampoline+0x2a/0x50 => vfs_read+0xab/0x1a0 => ksys_read+0x5f/0xe0 => do_syscall_64+0x33/0x40 => entry_SYSCALL_64_after_hwframe+0x44/0xae
Link: https://lkml.kernel.org/r/163163055130.489837.5161749078833497255.stgit@devnote2
Reported-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
show more ...
|
#
b59cc976 |
| 05-Feb-2021 |
Josh Poimboeuf <jpoimboe@redhat.com> |
x86/unwind/orc: Silence warnings caused by missing ORC data
The ORC unwinder attempts to fall back to frame pointers when ORC data is missing for a given instruction. It sets state->error, but then
x86/unwind/orc: Silence warnings caused by missing ORC data
The ORC unwinder attempts to fall back to frame pointers when ORC data is missing for a given instruction. It sets state->error, but then tries to keep going as a best-effort type of thing. That may result in further warnings if the unwinder gets lost.
Until we have some way to register generated code with the unwinder, missing ORC will be expected, and occasionally going off the rails will also be expected. So don't warn about it.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Ivan Babrou <ivan@cloudflare.com> Link: https://lkml.kernel.org/r/06d02c4bbb220bd31668db579278b0352538efbb.1612534649.git.jpoimboe@redhat.com
show more ...
|
#
e504e74c |
| 05-Feb-2021 |
Josh Poimboeuf <jpoimboe@redhat.com> |
x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2
KASAN reserves "redzone" areas between stack frames in order to detect stack overruns. A read or write to such an area triggers a
x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2
KASAN reserves "redzone" areas between stack frames in order to detect stack overruns. A read or write to such an area triggers a KASAN "stack-out-of-bounds" BUG.
Normally, the ORC unwinder stays in-bounds and doesn't access the redzone. But sometimes it can't find ORC metadata for a given instruction. This can happen for code which is missing ORC metadata, or for generated code. In such cases, the unwinder attempts to fall back to frame pointers, as a best-effort type thing.
This fallback often works, but when it doesn't, the unwinder can get confused and go off into the weeds into the KASAN redzone, triggering the aforementioned KASAN BUG.
But in this case, the unwinder's confusion is actually harmless and working as designed. It already has checks in place to prevent off-stack accesses, but those checks get short-circuited by the KASAN BUG. And a BUG is a lot more disruptive than a harmless unwinder warning.
Disable the KASAN checks by using READ_ONCE_NOCHECK() for all stack accesses. This finishes the job started by commit 881125bfe65b ("x86/unwind: Disable KASAN checking in the ORC unwinder"), which only partially fixed the issue.
Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reported-by: Ivan Babrou <ivan@cloudflare.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Ivan Babrou <ivan@cloudflare.com> Cc: stable@kernel.org Link: https://lkml.kernel.org/r/9583327904ebbbeda399eca9c56d6c7085ac20fe.1612534649.git.jpoimboe@redhat.com
show more ...
|
#
87ccc826 |
| 03-Feb-2021 |
Peter Zijlstra <peterz@infradead.org> |
x86/unwind/orc: Change REG_SP_INDIRECT
Currently REG_SP_INDIRECT is unused but means (%rsp + offset), change it to mean (%rsp) + offset.
The reason is that we're going to swizzle stack in the middl
x86/unwind/orc: Change REG_SP_INDIRECT
Currently REG_SP_INDIRECT is unused but means (%rsp + offset), change it to mean (%rsp) + offset.
The reason is that we're going to swizzle stack in the middle of a C function with non-trivial stack footprint. This means that when the unwinder finds the ToS, it needs to dereference it (%rsp) and then add the offset to the next frame, resulting in: (%rsp) + offset
This is somewhat unfortunate, since REG_BP_INDIRECT is used (by DRAP) and thus needs to retain the current (%rbp + offset).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
show more ...
|
#
f2ac57a4 |
| 14-Oct-2020 |
Jiri Slaby <jslaby@suse.cz> |
x86/unwind/orc: Fix inactive tasks with stack pointer in %sp on GCC 10 compiled kernels
GCC 10 optimizes the scheduler code differently than its predecessors.
When CONFIG_DEBUG_SECTION_MISMATCH=y,
x86/unwind/orc: Fix inactive tasks with stack pointer in %sp on GCC 10 compiled kernels
GCC 10 optimizes the scheduler code differently than its predecessors.
When CONFIG_DEBUG_SECTION_MISMATCH=y, the Makefile forces GCC not to inline some functions (-fno-inline-functions-called-once). Before GCC 10, "no-inlined" __schedule() starts with the usual prologue:
push %bp mov %sp, %bp
So the ORC unwinder simply picks stack pointer from %bp and unwinds from __schedule() just perfectly:
$ cat /proc/1/stack [<0>] ep_poll+0x3e9/0x450 [<0>] do_epoll_wait+0xaa/0xc0 [<0>] __x64_sys_epoll_wait+0x1a/0x20 [<0>] do_syscall_64+0x33/0x40 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
But now, with GCC 10, there is no %bp prologue in __schedule():
$ cat /proc/1/stack <nothing>
The ORC entry of the point in __schedule() is:
sp:sp+88 bp:last_sp-48 type:call end:0
In this case, nobody subtracts sizeof "struct inactive_task_frame" in __unwind_start(). The struct is put on the stack by __switch_to_asm() and only then __switch_to_asm() stores %sp to task->thread.sp. But we start unwinding from a point in __schedule() (stored in frame->ret_addr by 'call') and not in __switch_to_asm().
So for these example values in __unwind_start():
sp=ffff94b50001fdc8 bp=ffff8e1f41d29340 ip=__schedule+0x1f0
The stack is:
ffff94b50001fdc8: ffff8e1f41578000 # struct inactive_task_frame ffff94b50001fdd0: 0000000000000000 ffff94b50001fdd8: ffff8e1f41d29340 ffff94b50001fde0: ffff8e1f41611d40 # ... ffff94b50001fde8: ffffffff93c41920 # bx ffff94b50001fdf0: ffff8e1f41d29340 # bp ffff94b50001fdf8: ffffffff9376cad0 # ret_addr (and end of the struct)
0xffffffff9376cad0 is __schedule+0x1f0 (after the call to __switch_to_asm). Now follow those 88 bytes from the ORC entry (sp+88). The entry is correct, __schedule() really pushes 48 bytes (8*7) + 32 bytes via subq to store some local values (like 4U below). So to unwind, look at the offset 88-sizeof(long) = 0x50 from here:
ffff94b50001fe00: ffff8e1f41578618 ffff94b50001fe08: 00000cc000000255 ffff94b50001fe10: 0000000500000004 ffff94b50001fe18: 7793fab6956b2d00 # NOTE (see below) ffff94b50001fe20: ffff8e1f41578000 ffff94b50001fe28: ffff8e1f41578000 ffff94b50001fe30: ffff8e1f41578000 ffff94b50001fe38: ffff8e1f41578000 ffff94b50001fe40: ffff94b50001fed8 ffff94b50001fe48: ffff8e1f41577ff0 ffff94b50001fe50: ffffffff9376cf12
Here ^^^^^^^^^^^^^^^^ is the correct ret addr from __schedule(). It translates to schedule+0x42 (insn after a call to __schedule()).
BUT, unwind_next_frame() tries to take the address starting from 0xffff94b50001fdc8. That is exactly from thread.sp+88-sizeof(long) = 0xffff94b50001fdc8+88-8 = 0xffff94b50001fe18, which is garbage marked as NOTE above. So this quits the unwinding as 7793fab6956b2d00 is obviously not a kernel address.
There was a fix to skip 'struct inactive_task_frame' in unwind_get_return_address_ptr in the following commit:
187b96db5ca7 ("x86/unwind/orc: Fix unwind_get_return_address_ptr() for inactive tasks")
But we need to skip the struct already in the unwinder proper. So subtract the size (increase the stack pointer) of the structure in __unwind_start() directly. This allows for removal of the code added by commit 187b96db5ca7 completely, as the address is now at '(unsigned long *)state->sp - 1', the same as in the generic case.
[ mingo: Cleaned up the changelog a bit, for better readability. ]
Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Bug: https://bugzilla.suse.com/show_bug.cgi?id=1176907 Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20201014053051.24199-1-jslaby@suse.cz
show more ...
|
#
ee819aed |
| 04-Sep-2020 |
Julien Thierry <jthierry@redhat.com> |
objtool: Make unwind hint definitions available to other architectures
Unwind hints are useful to provide objtool with information about stack states in non-standard functions/code.
While the type
objtool: Make unwind hint definitions available to other architectures
Unwind hints are useful to provide objtool with information about stack states in non-standard functions/code.
While the type of information being provided might be very arch specific, the mechanism to provide the information can be useful for other architectures.
Move the relevant unwint hint definitions for all architectures to see.
[ jpoimboe: REGS_IRET -> REGS_PARTIAL ]
Signed-off-by: Julien Thierry <jthierry@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
show more ...
|
#
372a8eaa |
| 17-Jul-2020 |
Josh Poimboeuf <jpoimboe@redhat.com> |
x86/unwind/orc: Fix ORC for newly forked tasks
The ORC unwinder fails to unwind newly forked tasks which haven't yet run on the CPU. It correctly reads the 'ret_from_fork' instruction pointer from
x86/unwind/orc: Fix ORC for newly forked tasks
The ORC unwinder fails to unwind newly forked tasks which haven't yet run on the CPU. It correctly reads the 'ret_from_fork' instruction pointer from the stack, but it incorrectly interprets that value as a call stack address rather than a "signal" one, so the address gets incorrectly decremented in the call to orc_find(), resulting in bad ORC data.
Fix it by forcing 'ret_from_fork' frames to be signal frames.
Reported-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Link: https://lkml.kernel.org/r/f91a8778dde8aae7f71884b5df2b16d552040441.1594994374.git.jpoimboe@redhat.com
show more ...
|
#
187b96db |
| 22-May-2020 |
Josh Poimboeuf <jpoimboe@redhat.com> |
x86/unwind/orc: Fix unwind_get_return_address_ptr() for inactive tasks
Normally, show_trace_log_lvl() scans the stack, looking for text addresses to print. In parallel, it unwinds the stack with un
x86/unwind/orc: Fix unwind_get_return_address_ptr() for inactive tasks
Normally, show_trace_log_lvl() scans the stack, looking for text addresses to print. In parallel, it unwinds the stack with unwind_next_frame(). If the stack address matches the pointer returned by unwind_get_return_address_ptr() for the current frame, the text address is printed normally without a question mark. Otherwise it's considered a breadcrumb (potentially from a previous call path) and it's printed with a question mark to indicate that the address is unreliable and typically can be ignored.
Since the following commit:
f1d9a2abff66 ("x86/unwind/orc: Don't skip the first frame for inactive tasks")
... for inactive tasks, show_trace_log_lvl() prints *only* unreliable addresses (prepended with '?').
That happens because, for the first frame of an inactive task, unwind_get_return_address_ptr() returns the wrong return address pointer: one word *below* the task stack pointer. show_trace_log_lvl() starts scanning at the stack pointer itself, so it never finds the first 'reliable' address, causing only guesses to being printed.
The first frame of an inactive task isn't a normal stack frame. It's actually just an instance of 'struct inactive_task_frame' which is left behind by __switch_to_asm(). Now that this inactive frame is actually exposed to callers, fix unwind_get_return_address_ptr() to interpret it properly.
Fixes: f1d9a2abff66 ("x86/unwind/orc: Don't skip the first frame for inactive tasks") Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200522135435.vbxs7umku5pyrdbk@treble
show more ...
|
#
71c95825 |
| 14-May-2020 |
Josh Poimboeuf <jpoimboe@redhat.com> |
x86/unwind/orc: Fix error handling in __unwind_start()
The unwind_state 'error' field is used to inform the reliable unwinding code that the stack trace can't be trusted. Set this field for all err
x86/unwind/orc: Fix error handling in __unwind_start()
The unwind_state 'error' field is used to inform the reliable unwinding code that the stack trace can't be trusted. Set this field for all errors in __unwind_start().
Also, move the zeroing out of the unwind_state struct to before the ORC table initialization check, to prevent the caller from reading uninitialized data if the ORC table is corrupted.
Fixes: af085d9084b4 ("stacktrace/x86: add function for detecting reliable stack traces") Fixes: d3a09104018c ("x86/unwinder/orc: Dont bail on stack overflow") Fixes: 98d0c8ebf77e ("x86/unwind/orc: Prevent unwinding before ORC initialization") Reported-by: Pavel Machek <pavel@denx.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/d6ac7215a84ca92b895fdd2e1aa546729417e6e6.1589487277.git.jpoimboe@redhat.com
show more ...
|
#
fb9cbbc8 |
| 28-Apr-2020 |
Josh Poimboeuf <jpoimboe@redhat.com> |
x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULES
Fix the following warnings seen with !CONFIG_MODULES:
arch/x86/kernel/unwind_orc.c:29:26: warning: 'cur_orc_table' defined but not
x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULES
Fix the following warnings seen with !CONFIG_MODULES:
arch/x86/kernel/unwind_orc.c:29:26: warning: 'cur_orc_table' defined but not used [-Wunused-variable] 29 | static struct orc_entry *cur_orc_table = __start_orc_unwind; | ^~~~~~~~~~~~~ arch/x86/kernel/unwind_orc.c:28:13: warning: 'cur_orc_ip_table' defined but not used [-Wunused-variable] 28 | static int *cur_orc_ip_table = __start_orc_unwind_ip; | ^~~~~~~~~~~~~~~~
Fixes: 153eb2223c79 ("x86/unwind/orc: Convert global variables to static") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linux Next Mailing List <linux-next@vger.kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200428071640.psn5m7eh3zt2in4v@treble
show more ...
|
#
81b67439 |
| 25-Apr-2020 |
Josh Poimboeuf <jpoimboe@redhat.com> |
x86/unwind/orc: Fix premature unwind stoppage due to IRET frames
The following execution path is possible:
fsnotify() [ realign the stack and store previous SP in R10 ] <IRQ> [ only
x86/unwind/orc: Fix premature unwind stoppage due to IRET frames
The following execution path is possible:
fsnotify() [ realign the stack and store previous SP in R10 ] <IRQ> [ only IRET regs saved ] common_interrupt() interrupt_entry() <NMI> [ full pt_regs saved ] ... [ unwind stack ]
When the unwinder goes through the NMI and the IRQ on the stack, and then sees fsnotify(), it doesn't have access to the value of R10, because it only has the five IRET registers. So the unwind stops prematurely.
However, because the interrupt_entry() code is careful not to clobber R10 before saving the full regs, the unwinder should be able to read R10 from the previously saved full pt_regs associated with the NMI.
Handle this case properly. When encountering an IRET regs frame immediately after a full pt_regs frame, use the pt_regs as a backup which can be used to get the C register values.
Also, note that a call frame resets the 'prev_regs' value, because a function is free to clobber the registers. For this fix to work, the IRET and full regs frames must be adjacent, with no FUNC frames in between. So replace the FUNC hint in interrupt_entry() with an IRET_REGS hint.
Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/97a408167cc09f1cfa0de31a7b70dd88868d743f.1587808742.git.jpoimboe@redhat.com
show more ...
|