#
08701813 |
| 22-Jan-2024 |
Oleg Nesterov <oleg@redhat.com> |
ptrace_attach: shift send(SIGSTOP) into ptrace_set_stopped()
Turn send_sig_info(SIGSTOP) into send_signal_locked(SIGSTOP) and move it from ptrace_attach() to ptrace_set_stopped().
This looks more l
ptrace_attach: shift send(SIGSTOP) into ptrace_set_stopped()
Turn send_sig_info(SIGSTOP) into send_signal_locked(SIGSTOP) and move it from ptrace_attach() to ptrace_set_stopped().
This looks more logical and avoids lock(siglock) right after unlock().
Link: https://lkml.kernel.org/r/20240122171631.GA29844@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
27bbb2a0 |
| 21-Nov-2023 |
Oleg Nesterov <oleg@redhat.com> |
__ptrace_unlink: kill the obsolete "FIXME" code
The corner case described by the comment is no longer possible after the commit 7b3c36fc4c23 ("ptrace: fix task_join_group_stop() for the case when cu
__ptrace_unlink: kill the obsolete "FIXME" code
The corner case described by the comment is no longer possible after the commit 7b3c36fc4c23 ("ptrace: fix task_join_group_stop() for the case when current is traced"), task_join_group_stop() ensures that the new thread has the correct signr in JOBCTL_STOP_SIGMASK regardless of ptrace.
Link: https://lkml.kernel.org/r/20231121162650.GA6635@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
5431fdd2 |
| 17-Sep-2023 |
Peter Zijlstra <peterz@infradead.org> |
ptrace: Convert ptrace_attach() to use lock guards
Created as testing for the conditional guard infrastructure. Specifically this makes use of the following form:
scoped_cond_guard (mutex_intr, r
ptrace: Convert ptrace_attach() to use lock guards
Created as testing for the conditional guard infrastructure. Specifically this makes use of the following form:
scoped_cond_guard (mutex_intr, return -ERESTARTNOINTR, &task->signal->cred_guard_mutex) { ... } ... return 0;
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20231102110706.568467727%40infradead.org
show more ...
|
#
c43cfa42 |
| 02-Oct-2023 |
Lorenzo Stoakes <lstoakes@gmail.com> |
mm: make __access_remote_vm() static
Patch series "various improvements to the GUP interface", v2.
A series of fixes to simplify and improve the GUP interface with an eye to providing groundwork to
mm: make __access_remote_vm() static
Patch series "various improvements to the GUP interface", v2.
A series of fixes to simplify and improve the GUP interface with an eye to providing groundwork to future improvements:-
* __access_remote_vm() and access_remote_vm() are functionally identical, so make the former static such that in future we can potentially change the external-facing implementation details of this function.
* Extend is_valid_gup_args() to cover the missing FOLL_TOUCH case, and simplify things by defining INTERNAL_GUP_FLAGS to check against.
* Adjust __get_user_pages_locked() to explicitly treat a failure to pin any pages as an error in all circumstances other than FOLL_NOWAIT being specified, bringing it in line with the nommu implementation of this function.
* (With many thanks to Arnd who suggested this in the first instance) Update get_user_page_vma_remote() to explicitly only return a page or an error, simplifying the interface and avoiding the questionable IS_ERR_OR_NULL() pattern.
This patch (of 4):
access_remote_vm() passes through parameters to __access_remote_vm() directly, so remove the __access_remote_vm() function from mm.h and use access_remote_vm() in the one caller that needs it (ptrace_access_vm()).
This allows future adjustments to the GUP-internal __access_remote_vm() function while keeping the access_remote_vm() function stable.
Link: https://lkml.kernel.org/r/cover.1696288092.git.lstoakes@gmail.com Link: https://lkml.kernel.org/r/f7877c5039ce1c202a514a8aeeefc5cdd5e32d19.1696288092.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
3f67987c |
| 07-Apr-2023 |
Gregory Price <gourry.memverge@gmail.com> |
ptrace: Provide set/get interface for syscall user dispatch
The syscall user dispatch configuration can only be set by the task itself, but lacks a ptrace set/get interface which makes it impossible
ptrace: Provide set/get interface for syscall user dispatch
The syscall user dispatch configuration can only be set by the task itself, but lacks a ptrace set/get interface which makes it impossible to implement checkpoint/restore for it.
Add the required ptrace requests and the get/set functions in the syscall user dispatch code to make that possible.
Signed-off-by: Gregory Price <gregory.price@memverge.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20230407171834.3558-4-gregory.price@memverge.com
show more ...
|
#
ee3e3ac0 |
| 22-Nov-2022 |
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> |
rseq: Introduce extensible rseq ABI
Introduce the extensible rseq ABI, where the feature size supported by the kernel and the required alignment are communicated to user-space through ELF auxiliary
rseq: Introduce extensible rseq ABI
Introduce the extensible rseq ABI, where the feature size supported by the kernel and the required alignment are communicated to user-space through ELF auxiliary vectors.
This allows user-space to call rseq registration with a rseq_len of either 32 bytes for the original struct rseq size (which includes padding), or larger.
If rseq_len is larger than 32 bytes, then it must be large enough to contain the feature size communicated to user-space through ELF auxiliary vectors.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221122203932.231377-4-mathieu.desnoyers@efficios.com
show more ...
|
#
f5d39b02 |
| 22-Aug-2022 |
Peter Zijlstra <peterz@infradead.org> |
freezer,sched: Rewrite core freezer logic
Rewrite the core freezer to behave better wrt thawing and be simpler in general.
By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is ensu
freezer,sched: Rewrite core freezer logic
Rewrite the core freezer to behave better wrt thawing and be simpler in general.
By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is ensured frozen tasks stay frozen until thawed and don't randomly wake up early, as is currently possible.
As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up two PF_flags (yay!).
Specifically; the current scheme works a little like:
freezer_do_not_count(); schedule(); freezer_count();
And either the task is blocked, or it lands in try_to_freezer() through freezer_count(). Now, when it is blocked, the freezer considers it frozen and continues.
However, on thawing, once pm_freezing is cleared, freezer_count() stops working, and any random/spurious wakeup will let a task run before its time.
That is, thawing tries to thaw things in explicit order; kernel threads and workqueues before doing bringing SMP back before userspace etc.. However due to the above mentioned races it is entirely possible for userspace tasks to thaw (by accident) before SMP is back.
This can be a fatal problem in asymmetric ISA architectures (eg ARMv9) where the userspace task requires a special CPU to run.
As said; replace this with a special task state TASK_FROZEN and add the following state transitions:
TASK_FREEZABLE -> TASK_FROZEN __TASK_STOPPED -> TASK_FROZEN __TASK_TRACED -> TASK_FROZEN
The new TASK_FREEZABLE can be set on any state part of TASK_NORMAL (IOW. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE) -- any such state is already required to deal with spurious wakeups and the freezer causes one such when thawing the task (since the original state is lost).
The special __TASK_{STOPPED,TRACED} states *can* be restored since their canonical state is in ->jobctl.
With this, frozen tasks need an explicit TASK_FROZEN wakeup and are free of undue (early / spurious) wakeups.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20220822114649.055452969@infradead.org
show more ...
|
#
de2a3477 |
| 06-Jul-2022 |
Sven Schnelle <svens@linux.ibm.com> |
ptrace: fix clearing of JOBCTL_TRACED in ptrace_unfreeze_traced()
CI reported the following splat while running the strace testsuite:
WARNING: CPU: 1 PID: 3570031 at kernel/ptrace.c:272 ptrace_ch
ptrace: fix clearing of JOBCTL_TRACED in ptrace_unfreeze_traced()
CI reported the following splat while running the strace testsuite:
WARNING: CPU: 1 PID: 3570031 at kernel/ptrace.c:272 ptrace_check_attach+0x12e/0x178 CPU: 1 PID: 3570031 Comm: strace Tainted: G OE 5.19.0-20220624.rc3.git0.ee819a77d4e7.300.fc36.s390x #1 Hardware name: IBM 3906 M04 704 (z/VM 7.1.0) Call Trace: [<00000000ab4b645a>] ptrace_check_attach+0x132/0x178 ([<00000000ab4b6450>] ptrace_check_attach+0x128/0x178) [<00000000ab4b6cde>] __s390x_sys_ptrace+0x86/0x160 [<00000000ac03fcec>] __do_syscall+0x1d4/0x200 [<00000000ac04e312>] system_call+0x82/0xb0 Last Breaking-Event-Address: [<00000000ab4ea3c8>] wait_task_inactive+0x98/0x190
This is because JOBCTL_TRACED is set, but the task is not in TASK_TRACED state. Caused by ptrace_unfreeze_traced() which does:
task->jobctl &= ~TASK_TRACED
but it should be:
task->jobctl &= ~JOBCTL_TRACED
Fixes: 31cae1eaae4f ("sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Tested-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
31cae1ea |
| 03-May-2022 |
Peter Zijlstra <peterz@infradead.org> |
sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
Currently ptrace_stop() / do_signal_stop() rely on the special states TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, th
sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
Currently ptrace_stop() / do_signal_stop() rely on the special states TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this state exists only in task->__state and nowhere else.
There's two spots of bother with this:
- PREEMPT_RT has task->saved_state which complicates matters, meaning task_is_{traced,stopped}() needs to check an additional variable.
- An alternative freezer implementation that itself relies on a special TASK state would loose TASK_TRACED/TASK_STOPPED and will result in misbehaviour.
As such, add additional state to task->jobctl to track this state outside of task->__state.
NOTE: this doesn't actually fix anything yet, just adds extra state.
--EWB * didn't add a unnecessary newline in signal.h * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up instead of in signal_wake_up_state. This prevents the clearing of TASK_STOPPED and TASK_TRACED from getting lost. * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-12-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
show more ...
|
#
5b4197cb |
| 29-Apr-2022 |
Eric W. Biederman <ebiederm@xmission.com> |
ptrace: Always take siglock in ptrace_resume
Make code analysis simpler and future changes easier by always taking siglock in ptrace_resume.
Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by
ptrace: Always take siglock in ptrace_resume
Make code analysis simpler and future changes easier by always taking siglock in ptrace_resume.
Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-11-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
2500ad1c |
| 29-Apr-2022 |
Eric W. Biederman <ebiederm@xmission.com> |
ptrace: Don't change __state
Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace command is executing.
Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and impleme
ptrace: Don't change __state
Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace command is executing.
Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and implement a new jobctl flag TASK_PTRACE_FROZEN. This new flag is set in jobctl_freeze_task and cleared when ptrace_stop is awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).
In signal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL when the wake up is for a fatal signal. Skip adding __TASK_TRACED when TASK_PTRACE_FROZEN is not set. This has the same effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that use TASK_KILLABLE go through signal_wake_up.
Handle a ptrace_stop being called with a pending fatal signal. Previously it would have been handled by schedule simply failing to sleep. As TASK_WAKEKILL is no longer part of TASK_TRACED schedule will sleep with a fatal_signal_pending. The code in signal_wake_up guarantees that the code will be awaked by any fatal signal that codes after TASK_TRACED is set.
Previously the __state value of __TASK_TRACED was changed to TASK_RUNNING when woken up or back to TASK_TRACED when the code was left in ptrace_stop. Now when woken up ptrace_stop now clears JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced clears JOBCTL_PTRACE_FROZEN.
Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-10-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
7b0fe136 |
| 05-May-2022 |
Eric W. Biederman <ebiederm@xmission.com> |
ptrace: Document that wait_task_inactive can't fail
After ptrace_freeze_traced succeeds it is known that the tracee has a __state value of __TASK_TRACED and that no __ptrace_unlink will happen becau
ptrace: Document that wait_task_inactive can't fail
After ptrace_freeze_traced succeeds it is known that the tracee has a __state value of __TASK_TRACED and that no __ptrace_unlink will happen because the tracer is waiting for the tracee, and the tracee is in ptrace_stop.
The function ptrace_freeze_traced can succeed at any point after ptrace_stop has set TASK_TRACED and dropped siglock. The read_lock on tasklist_lock only excludes ptrace_attach.
This means that the !current->ptrace which executes under a read_lock of tasklist_lock will never see a ptrace_freeze_trace as the tracer must have gone away before the tasklist_lock was taken and ptrace_attach can not occur until the read_lock is dropped. As ptrace_freeze_traced depends upon ptrace_attach running before it can run that excludes ptrace_freeze_traced until __state is set to TASK_RUNNING. This means that task_is_traced will fail in ptrace_freeze_attach and ptrace_freeze_attached will fail.
On the current->ptrace branch of ptrace_stop which will be reached any time after ptrace_freeze_traced has succeed it is known that __state is __TASK_TRACED and schedule() will be called with that state.
Use a WARN_ON_ONCE to document that wait_task_inactive(TASK_TRACED) should never fail. Remove the stale comment about may_ptrace_stop.
Strictly speaking this is not true because if PREEMPT_RT is enabled wait_task_inactive can fail because __state can be changed. I don't see this as a problem as the ptrace code is currently broken on PREMPT_RT, and this is one of the issues. Failing and warning when the assumptions of the code are broken is good.
Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-8-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
6a2d90ba |
| 29-Apr-2022 |
Eric W. Biederman <ebiederm@xmission.com> |
ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
The current implementation of PTRACE_KILL is buggy and has been for many years as it assumes it's target has stopped in ptrace_stop. At a q
ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
The current implementation of PTRACE_KILL is buggy and has been for many years as it assumes it's target has stopped in ptrace_stop. At a quick skim it looks like this assumption has existed since ptrace support was added in linux v1.0.
While PTRACE_KILL has been deprecated we can not remove it as a quick search with google code search reveals many existing programs calling it.
When the ptracee is not stopped at ptrace_stop some fields would be set that are ignored except in ptrace_stop. Making the userspace visible behavior of PTRACE_KILL a noop in those case.
As the usual rules are not obeyed it is not clear what the consequences are of calling PTRACE_KILL on a running process. Presumably userspace does not do this as it achieves nothing.
Replace the implementation of PTRACE_KILL with a simple send_sig_info(SIGKILL) followed by a return 0. This changes the observable user space behavior only in that PTRACE_KILL on a process not stopped in ptrace_stop will also kill it. As that has always been the intent of the code this seems like a reasonable change.
Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@zeniv.linux.org.uk> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-7-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
16cc1bc6 |
| 29-Apr-2022 |
Eric W. Biederman <ebiederm@xmission.com> |
ptrace: Remove arch_ptrace_attach
The last remaining implementation of arch_ptrace_attach is ia64's ptrace_attach_sync_user_rbs which was added at the end of 2007 in commit aa91a2e90044 ("[IA64] Syn
ptrace: Remove arch_ptrace_attach
The last remaining implementation of arch_ptrace_attach is ia64's ptrace_attach_sync_user_rbs which was added at the end of 2007 in commit aa91a2e90044 ("[IA64] Synchronize RBS on PTRACE_ATTACH").
Reading the comments and examining the code ptrace_attach_sync_user_rbs has the sole purpose of saving registers to the stack when ptrace_attach changes TASK_STOPPED to TASK_TRACED. In all other cases arch_ptrace_stop takes care of the register saving.
In commit d79fdd6d96f4 ("ptrace: Clean transitions between TASK_STOPPED and TRACED") modified ptrace_attach to wake up the thread and enter ptrace_stop normally even when the thread starts out stopped.
This makes ptrace_attach_sync_user_rbs completely unnecessary. So just remove it.
I read through the code to verify that ptrace_attach_sync_user_rbs is unnecessary. What I found is that the code is quite dead.
Reading ptrace_attach_sync_user_rbs it is easy to see that the it does nothing unless __state == TASK_STOPPED.
Calling arch_ptrace_attach (aka ptrace_attach_sync_user_rbs) after ptrace_traceme it is easy to see that because we are talking about the current process the value of __state is TASK_RUNNING. Which means ptrace_attach_sync_user_rbs does nothing.
The only other call of arch_ptrace_attach (aka ptrace_attach_sync_user_rbs) is after ptrace_attach.
If the task is running (and PTRACE_SEIZE is not specified), a SIGSTOP is sent which results in do_signal_stop setting JOBCTL_TRAP_STOP on the target task (as it is ptraced) and the target task stopping in ptrace_stop with __state == TASK_TRACED.
If the task was already stopped then ptrace_attach sets JOBCTL_TRAPPING and JOBCTL_TRAP_STOP, wakes it out of __TASK_STOPPED, and waits until the JOBCTL_TRAPPING_BIT is clear. At which point the task stops in ptrace_stop.
In both cases there are a couple of funning excpetions such as if the traced task receiveds a SIGCONT, or is set a fatal signal.
However in all of those cases the tracee never stops in __state TASK_STOPPED. Which is a long way of saying that ptrace_attach_sync_user_rbs is guaranteed never to do anything.
Cc: linux-ia64@vger.kernel.org Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20220505182645.497868-4-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
f26b2afd |
| 29-Apr-2022 |
Tiezhu Yang <yangtiezhu@loongson.cn> |
ptrace: remove redudant check of #ifdef PTRACE_SINGLESTEP
Patch series "ptrace: do some cleanup".
This patch (of 3):
PTRACE_SINGLESTEP is always defined as 9 in include/uapi/linux/ptrace.h, remov
ptrace: remove redudant check of #ifdef PTRACE_SINGLESTEP
Patch series "ptrace: do some cleanup".
This patch (of 3):
PTRACE_SINGLESTEP is always defined as 9 in include/uapi/linux/ptrace.h, remove redudant check of #ifdef PTRACE_SINGLESTEP.
Link: https://lkml.kernel.org/r/1649240981-11024-2-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
ee1fee90 |
| 19-Mar-2022 |
Jann Horn <jannh@google.com> |
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
Setting PTRACE_O_SUSPEND_SECCOMP is supposed to be a highly privileged operation because it allows the tracee to completely bypass a
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
Setting PTRACE_O_SUSPEND_SECCOMP is supposed to be a highly privileged operation because it allows the tracee to completely bypass all seccomp filters on kernels with CONFIG_CHECKPOINT_RESTORE=y. It is only supposed to be settable by a process with global CAP_SYS_ADMIN, and only if that process is not subject to any seccomp filters at all.
However, while these permission checks were done on the PTRACE_SETOPTIONS path, they were missing on the PTRACE_SEIZE path, which also sets user-specified ptrace flags.
Move the permissions checks out into a helper function and let both ptrace_attach() and ptrace_setoptions() call it.
Cc: stable@kernel.org Fixes: 13c4a90119d2 ("seccomp: add ptrace options for suspend/resume") Signed-off-by: Jann Horn <jannh@google.com> Link: https://lkml.kernel.org/r/20220319010838.1386861-1-jannh@google.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
show more ...
|
#
6707d0fc |
| 20-Dec-2021 |
Eric W. Biederman <ebiederm@xmission.com> |
ptrace: Remove second setting of PT_SEIZED in ptrace_attach
The code is totally redundant remove it.
Link: https://lkml.kernel.org/r/20220103213312.9144-6-ebiederm@xmission.com Signed-off-by: "Eric
ptrace: Remove second setting of PT_SEIZED in ptrace_attach
The code is totally redundant remove it.
Link: https://lkml.kernel.org/r/20220103213312.9144-6-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
2f064a59 |
| 11-Jun-2021 |
Peter Zijlstra <peterz@infradead.org> |
sched: Change task_struct::state
Change the type and name of task_struct::state. Drop the volatile and shrink it to an 'unsigned int'. Rename it in order to find all uses such that we can use READ_O
sched: Change task_struct::state
Change the type and name of task_struct::state. Drop the volatile and shrink it to an 'unsigned int'. Rename it in order to find all uses such that we can use READ_ONCE/WRITE_ONCE as appropriate.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20210611082838.550736351@infradead.org
show more ...
|
#
dbb5afad |
| 12-May-2021 |
Oleg Nesterov <oleg@redhat.com> |
ptrace: make ptrace() fail if the tracee changed its pid unexpectedly
Suppose we have 2 threads, the group-leader L and a sub-theread T, both parked in ptrace_stop(). Debugger tries to resume both t
ptrace: make ptrace() fail if the tracee changed its pid unexpectedly
Suppose we have 2 threads, the group-leader L and a sub-theread T, both parked in ptrace_stop(). Debugger tries to resume both threads and does
ptrace(PTRACE_CONT, T); ptrace(PTRACE_CONT, L);
If the sub-thread T execs in between, the 2nd PTRACE_CONT doesn not resume the old leader L, it resumes the post-exec thread T which was actually now stopped in PTHREAD_EVENT_EXEC. In this case the PTHREAD_EVENT_EXEC event is lost, and the tracer can't know that the tracee changed its pid.
This patch makes ptrace() fail in this case until debugger does wait() and consumes PTHREAD_EVENT_EXEC which reports old_pid. This affects all ptrace requests except the "asynchronous" PTRACE_INTERRUPT/KILL.
The patch doesn't add the new PTRACE_ option to not complicate the API, and I _hope_ this won't cause any noticeable regression:
- If debugger uses PTRACE_O_TRACEEXEC and the thread did an exec and the tracer does a ptrace request without having consumed the exec event, it's 100% sure that the thread the ptracer thinks it is targeting does not exist anymore, or isn't the same as the one it thinks it is targeting.
- To some degree this patch adds nothing new. In the scenario above ptrace(L) can fail with -ESRCH if it is called after the execing sub-thread wakes the leader up and before it "steals" the leader's pid.
Test-case:
#include <stdio.h> #include <unistd.h> #include <signal.h> #include <sys/ptrace.h> #include <sys/wait.h> #include <errno.h> #include <pthread.h> #include <assert.h>
void *tf(void *arg) { execve("/usr/bin/true", NULL, NULL); assert(0);
return NULL; }
int main(void) { int leader = fork(); if (!leader) { kill(getpid(), SIGSTOP);
pthread_t th; pthread_create(&th, NULL, tf, NULL); for (;;) pause();
return 0; }
waitpid(leader, NULL, WSTOPPED);
ptrace(PTRACE_SEIZE, leader, 0, PTRACE_O_TRACECLONE | PTRACE_O_TRACEEXEC); waitpid(leader, NULL, 0);
ptrace(PTRACE_CONT, leader, 0,0); waitpid(leader, NULL, 0);
int status, thread = waitpid(-1, &status, 0); assert(thread > 0 && thread != leader); assert(status == 0x80137f);
ptrace(PTRACE_CONT, thread, 0,0); /* * waitid() because waitpid(leader, &status, WNOWAIT) does not * report status. Why ???? * * Why WEXITED? because we have another kernel problem connected * to mt-exec. */ siginfo_t info; assert(waitid(P_PID, leader, &info, WSTOPPED|WEXITED|WNOWAIT) == 0); assert(info.si_pid == leader && info.si_status == 0x0405);
/* OK, it sleeps in ptrace(PTRACE_EVENT_EXEC == 0x04) */ assert(ptrace(PTRACE_CONT, leader, 0,0) == -1); assert(errno == ESRCH);
assert(leader == waitpid(leader, &status, WNOHANG)); assert(status == 0x04057f);
assert(ptrace(PTRACE_CONT, leader, 0,0) == 0);
return 0; }
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Simon Marchi <simon.marchi@efficios.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pedro Alves <palves@redhat.com> Acked-by: Simon Marchi <simon.marchi@efficios.com> Acked-by: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
e8b33b8c |
| 26-Mar-2021 |
Jens Axboe <axboe@kernel.dk> |
Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals"
This reverts commit 6fb8f43cede0e4bd3ead847de78d531424a96be9.
The IO threads do allow signals now, including SIGSTOP, and we c
Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals"
This reverts commit 6fb8f43cede0e4bd3ead847de78d531424a96be9.
The IO threads do allow signals now, including SIGSTOP, and we can allow ptrace attach. Attaching won't reveal anything interesting for the IO threads, but it will allow eg gdb to attach to a task with io_urings and IO threads without complaining. And once attached, it will allow the usual introspection into regular threads.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
90f093fa |
| 26-Feb-2021 |
Piotr Figiel <figiel@google.com> |
rseq, ptrace: Add PTRACE_GET_RSEQ_CONFIGURATION request
For userspace checkpoint and restore (C/R) a way of getting process state containing RSEQ configuration is needed.
There are two ways this in
rseq, ptrace: Add PTRACE_GET_RSEQ_CONFIGURATION request
For userspace checkpoint and restore (C/R) a way of getting process state containing RSEQ configuration is needed.
There are two ways this information is going to be used: - to re-enable RSEQ for threads which had it enabled before C/R - to detect if a thread was in a critical section during C/R
Since C/R preserves TLS memory and addresses RSEQ ABI will be restored using the address registered before C/R.
Detection whether the thread is in a critical section during C/R is needed to enforce behavior of RSEQ abort during C/R. Attaching with ptrace() before registers are dumped itself doesn't cause RSEQ abort. Restoring the instruction pointer within the critical section is problematic because rseq_cs may get cleared before the control is passed to the migrated application code leading to RSEQ invariants not being preserved. C/R code will use RSEQ ABI address to find the abort handler to which the instruction pointer needs to be set.
To achieve above goals expose the RSEQ ABI address and the signature value with the new ptrace request PTRACE_GET_RSEQ_CONFIGURATION.
This new ptrace request can also be used by debuggers so they are aware of stops within restartable sequences in progress.
Signed-off-by: Piotr Figiel <figiel@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michal Miroslaw <emmir@google.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20210226135156.1081606-1-figiel@google.com
show more ...
|
#
6fb8f43c |
| 18-Feb-2021 |
Jens Axboe <axboe@kernel.dk> |
kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
d3f5ffca |
| 15-Dec-2020 |
John Hubbard <jhubbard@nvidia.com> |
mm: cleanup: remove unused tsk arg from __access_remote_vm
Despite a comment that said that page fault accounting would be charged to whatever task_struct* was passed into __access_remote_vm(), the
mm: cleanup: remove unused tsk arg from __access_remote_vm
Despite a comment that said that page fault accounting would be charged to whatever task_struct* was passed into __access_remote_vm(), the tsk argument was actually unused.
Making page fault accounting actually use this task struct is quite a project, so there is no point in keeping the tsk argument.
Delete both the comment, and the argument.
[rppt@linux.ibm.com: changelog addition]
Link: https://lkml.kernel.org/r/20201026074137.4147787-1-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
cf237052 |
| 30-Oct-2020 |
Mickaël Salaün <mic@linux.microsoft.com> |
ptrace: Set PF_SUPERPRIV when checking capability
Commit 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat") replaced the use of ns_capable() with has_ns_capability{
ptrace: Set PF_SUPERPRIV when checking capability
Commit 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat") replaced the use of ns_capable() with has_ns_capability{,_noaudit}() which doesn't set PF_SUPERPRIV.
Commit 6b3ad6649a4c ("ptrace: reintroduce usage of subjective credentials in ptrace_has_cap()") replaced has_ns_capability{,_noaudit}() with security_capable(), which doesn't set PF_SUPERPRIV neither.
Since commit 98f368e9e263 ("kernel: Add noaudit variant of ns_capable()"), a new ns_capable_noaudit() helper is available. Let's use it!
As a result, the signature of ptrace_has_cap() is restored to its original one.
Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Eric Paris <eparis@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: Tyler Hicks <tyhicks@linux.microsoft.com> Cc: stable@vger.kernel.org Fixes: 6b3ad6649a4c ("ptrace: reintroduce usage of subjective credentials in ptrace_has_cap()") Fixes: 69f594a38967 ("ptrace: do not audit capability check when outputing /proc/pid/stat") Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20201030123849.770769-2-mic@digikod.net
show more ...
|
#
64eb35f7 |
| 16-Nov-2020 |
Gabriel Krisman Bertazi <krisman@collabora.com> |
ptrace: Migrate TIF_SYSCALL_EMU to use SYSCALL_WORK flag
On architectures using the generic syscall entry code the architecture independent syscall work is moved to flags in thread_info::syscall_wor
ptrace: Migrate TIF_SYSCALL_EMU to use SYSCALL_WORK flag
On architectures using the generic syscall entry code the architecture independent syscall work is moved to flags in thread_info::syscall_work. This removes architecture dependencies and frees up TIF bits.
Define SYSCALL_WORK_SYSCALL_EMU, use it in the generic entry code and convert the code which uses the TIF specific helper functions to use the new *_syscall_work() helpers which either resolve to the new mode for users of the generic entry code or to the TIF based functions for the other architectures.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20201116174206.2639648-8-krisman@collabora.com
show more ...
|