#
a436184e |
| 26-Feb-2024 |
Oleg Nesterov <oleg@redhat.com> |
get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task
This initialization is incomplete and unnecessary, neither do_group_exit() nor PF_USER_WORKER need ksig->info.
Link: htt
get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task
This initialization is incomplete and unnecessary, neither do_group_exit() nor PF_USER_WORKER need ksig->info.
Link: https://lkml.kernel.org/r/20240226165653.GA20834@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Wen Yang <wenyang.linux@foxmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
dd69edd6 |
| 26-Feb-2024 |
Oleg Nesterov <oleg@redhat.com> |
get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig
ksig->ka and ksig->info are not initialized if get_signal() returns 0 or if the caller is PF_USER_WORKER.
Check signr != 0 bef
get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig
ksig->ka and ksig->info are not initialized if get_signal() returns 0 or if the caller is PF_USER_WORKER.
Check signr != 0 before SA_EXPOSE_TAGBITS and move the "out" label down.
The latter means that ksig->sig won't be initialized if a PF_USER_WORKER thread gets a fatal signal but this is fine, PF_USER_WORKER's don't use ksig. And there is nothing new, in this case ksig->ka and ksig-info are not initialized anyway. Add a comment.
Link: https://lkml.kernel.org/r/20240226165650.GA20829@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Wen Yang <wenyang.linux@foxmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
49fd5f5a |
| 26-Feb-2024 |
Oleg Nesterov <oleg@redhat.com> |
get_signal: don't abuse ksig->info.si_signo and ksig->sig
Patch series "get_signal: minor cleanups and fix".
Lets remove this clear_siginfo() right now. It is incomplete (and thus looks confusing)
get_signal: don't abuse ksig->info.si_signo and ksig->sig
Patch series "get_signal: minor cleanups and fix".
Lets remove this clear_siginfo() right now. It is incomplete (and thus looks confusing) and unnecessary. Also, PF_USER_WORKER's already don't get a fully initialized ksig anyway.
This patch (of 3):
Cleanup and preparation for the next changes.
get_signal() uses signr or ksig->info.si_signo or ksig->sig in a chaotic way, this looks confusing. Change it to always use signr.
Link: https://lkml.kernel.org/r/20240226165612.GA20787@redhat.com Link: https://lkml.kernel.org/r/20240226165647.GA20826@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Wen Yang <wenyang.linux@foxmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
e1fb1dc0 |
| 09-Feb-2024 |
Christian Brauner <brauner@kernel.org> |
pidfd: allow to override signal scope in pidfd_send_signal()
Right now we determine the scope of the signal based on the type of pidfd. There are use-cases where it's useful to override the scope of
pidfd: allow to override signal scope in pidfd_send_signal()
Right now we determine the scope of the signal based on the type of pidfd. There are use-cases where it's useful to override the scope of the signal. For example in [1]. Add flags to determine the scope of the signal:
(1) PIDFD_SIGNAL_THREAD: send signal to specific thread reference by @pidfd (2) PIDFD_SIGNAL_THREAD_GROUP: send signal to thread-group of @pidfd (2) PIDFD_SIGNAL_PROCESS_GROUP: send signal to process-group of @pidfd
Since we now allow specifying PIDFD_SEND_PROCESS_GROUP for pidfd_send_signal() to send signals to process groups we need to adjust the check restricting si_code emulation by userspace to account for PIDTYPE_PGID.
Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://github.com/systemd/systemd/issues/31093 [1] Link: https://lore.kernel.org/r/20240210-chihuahua-hinzog-3945b6abd44a@brauner Link: https://lore.kernel.org/r/20240214123655.GB16265@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
81b9d8ac |
| 09-Feb-2024 |
Oleg Nesterov <oleg@redhat.com> |
pidfd: change pidfd_send_signal() to respect PIDFD_THREAD
Turn kill_pid_info() into kill_pid_info_type(), this allows to pass any pid_type to group_send_sig_info(), despite its name it should work f
pidfd: change pidfd_send_signal() to respect PIDFD_THREAD
Turn kill_pid_info() into kill_pid_info_type(), this allows to pass any pid_type to group_send_sig_info(), despite its name it should work fine even if type = PIDTYPE_PID.
Change pidfd_send_signal() to use PIDTYPE_PID or PIDTYPE_TGID depending on PIDFD_THREAD.
While at it kill another TODO comment in pidfd_show_fdinfo(). As Christian expains fdinfo reports f_flags, userspace can already detect PIDFD_THREAD.
Reviewed-by: Tycho Andersen <tandersen@netflix.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20240209130650.GA8048@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
c044a950 |
| 09-Feb-2024 |
Oleg Nesterov <oleg@redhat.com> |
signal: fill in si_code in prepare_kill_siginfo()
So that do_tkill() can use this helper too. This also simplifies the next patch.
TODO: perhaps we can kill prepare_kill_siginfo() and change the ca
signal: fill in si_code in prepare_kill_siginfo()
So that do_tkill() can use this helper too. This also simplifies the next patch.
TODO: perhaps we can kill prepare_kill_siginfo() and change the callers to use SEND_SIG_NOINFO, but this needs some changes in __send_signal_locked() and TP_STORE_SIGINFO().
Reviewed-by: Tycho Andersen <tandersen@netflix.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20240209130620.GA8039@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
9ed52108 |
| 05-Feb-2024 |
Oleg Nesterov <oleg@redhat.com> |
pidfd: change do_notify_pidfd() to use __wake_up(poll_to_key(EPOLLIN))
rather than wake_up_all(). This way do_notify_pidfd() won't wakeup the POLLHUP-only waiters which wait for pid_task() == NULL.
pidfd: change do_notify_pidfd() to use __wake_up(poll_to_key(EPOLLIN))
rather than wake_up_all(). This way do_notify_pidfd() won't wakeup the POLLHUP-only waiters which wait for pid_task() == NULL.
TODO: - as Christian pointed out, this asks for the new wake_up_all_poll() helper, it can already have other users.
- we can probably discriminate the PIDFD_THREAD and non-PIDFD_THREAD waiters, but this needs more work. See https://lore.kernel.org/all/20240205140848.GA15853@redhat.com/
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20240205141348.GA16539@redhat.com Reviewed-by: Tycho Andersen <tandersen@netflix.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
64bef697 |
| 31-Jan-2024 |
Oleg Nesterov <oleg@redhat.com> |
pidfd: implement PIDFD_THREAD flag for pidfd_open()
With this flag:
- pidfd_open() doesn't require that the target task must be a thread-group leader
- pidfd_poll() succeeds when the task exi
pidfd: implement PIDFD_THREAD flag for pidfd_open()
With this flag:
- pidfd_open() doesn't require that the target task must be a thread-group leader
- pidfd_poll() succeeds when the task exits and becomes a zombie (iow, passes exit_notify()), even if it is a leader and thread-group is not empty.
This means that the behaviour of pidfd_poll(PIDFD_THREAD, pid-of-group-leader) is not well defined if it races with exec() from its sub-thread; pidfd_poll() can succeed or not depending on whether pidfd_task_exited() is called before or after exchange_tids().
Perhaps we can improve this behaviour later, pidfd_poll() can probably take sig->group_exec_task into account. But this doesn't really differ from the case when the leader exits before other threads (so pidfd_poll() succeeds) and then another thread execs and pidfd_poll() will block again.
thread_group_exited() is no longer used, perhaps it can die.
Co-developed-by: Tycho Andersen <tycho@tycho.pizza> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20240131132602.GA23641@redhat.com Tested-by: Tycho Andersen <tandersen@netflix.com> Reviewed-by: Tycho Andersen <tandersen@netflix.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
21e25205 |
| 27-Jan-2024 |
Oleg Nesterov <oleg@redhat.com> |
pidfd: don't do_notify_pidfd() if !thread_group_empty()
do_notify_pidfd() makes no sense until the whole thread group exits, change do_notify_parent() to check thread_group_empty().
This avoids the
pidfd: don't do_notify_pidfd() if !thread_group_empty()
do_notify_pidfd() makes no sense until the whole thread group exits, change do_notify_parent() to check thread_group_empty().
This avoids the unnecessary do_notify_pidfd() when tsk is not a leader, or it exits before other threads, or it has a ptraced EXIT_ZOMBIE sub-thread.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20240127132407.GA29136@redhat.com Reviewed-by: Tycho Andersen <tandersen@netflix.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
b454ec29 |
| 20-Nov-2023 |
Oleg Nesterov <oleg@redhat.com> |
kernel/signal.c: simplify force_sig_info_to_task(), kill recalc_sigpending_and_wake()
The purpose of recalc_sigpending_and_wake() is not clear, it looks "obviously unneeded" because we are going to
kernel/signal.c: simplify force_sig_info_to_task(), kill recalc_sigpending_and_wake()
The purpose of recalc_sigpending_and_wake() is not clear, it looks "obviously unneeded" because we are going to send the signal which can't be blocked or ignored.
Add the comment to explain why we can't rely on send_signal_locked() and make this logic more simple/explicit. recalc_sigpending_and_wake() has no other users, it can die.
In fact I think we don't even need signal_wake_up(), the target task must be either current or a TASK_TRACED child, otherwise the usage of siglock is not safe. But this needs another change.
Link: https://lkml.kernel.org/r/20231120151649.GA15995@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
61a7a5e2 |
| 30-Oct-2023 |
Oleg Nesterov <oleg@redhat.com> |
introduce for_other_threads(p, t)
Cosmetic, but imho it makes the usage look more clear and simple, the new helper doesn't require to initialize "t".
After this change while_each_thread() has only
introduce for_other_threads(p, t)
Cosmetic, but imho it makes the usage look more clear and simple, the new helper doesn't require to initialize "t".
After this change while_each_thread() has only 3 users, and it is only used in the do/while loops.
Link: https://lkml.kernel.org/r/20231030155710.GA9095@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
a287116a |
| 26-Sep-2023 |
Li kunyu <kunyu@nfschina.com> |
kernel/signal: remove unnecessary NULL values from ucounts
ucounts is assigned first, so it does not need to initialize the assignment.
Link: https://lkml.kernel.org/r/20230926022410.4280-1-kunyu@n
kernel/signal: remove unnecessary NULL values from ucounts
ucounts is assigned first, so it does not need to initialize the assignment.
Link: https://lkml.kernel.org/r/20230926022410.4280-1-kunyu@nfschina.com Signed-off-by: Li kunyu <kunyu@nfschina.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
e5ecf29c |
| 09-Sep-2023 |
Oleg Nesterov <oleg@redhat.com> |
signal: complete_signal: use __for_each_thread()
do/while_each_thread should be avoided when possible.
Link: https://lkml.kernel.org/r/20230909164537.GA11633@redhat.com Signed-off-by: Oleg Nesterov
signal: complete_signal: use __for_each_thread()
do/while_each_thread should be avoided when possible.
Link: https://lkml.kernel.org/r/20230909164537.GA11633@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
39835204 |
| 23-Aug-2023 |
Oleg Nesterov <oleg@redhat.com> |
__kill_pgrp_info: simplify the calculation of return value
No need to calculate/check the "success" variable, we can kill it and update retval in the main loop unless it is zero.
Link: https://lkml
__kill_pgrp_info: simplify the calculation of return value
No need to calculate/check the "success" variable, we can kill it and update retval in the main loop unless it is zero.
Link: https://lkml.kernel.org/r/20230823171455.GA12188@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: David Laight <David.Laight@ACULAB.COM> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
1aabbc53 |
| 03-Aug-2023 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT
On PREEMPT_RT keeping preemption disabled during the invocation of cgroup_enter_frozen() is a problem because the function acquires cs
signal: Don't disable preemption in ptrace_stop() on PREEMPT_RT
On PREEMPT_RT keeping preemption disabled during the invocation of cgroup_enter_frozen() is a problem because the function acquires css_set_lock which is a sleeping lock on PREEMPT_RT and must not be acquired with disabled preemption.
The preempt-disabled section is only for performance optimisation reasons and can be avoided.
Extend the comment and don't disable preemption before scheduling on PREEMPT_RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20230803100932.325870-3-bigeasy@linutronix.de
show more ...
|
#
a20d6f63 |
| 03-Aug-2023 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
signal: Add a proper comment about preempt_disable() in ptrace_stop()
Commit 53da1d9456fe7 ("fix ptrace slowness") added a preempt-disable section between read_unlock() and the following schedule()
signal: Add a proper comment about preempt_disable() in ptrace_stop()
Commit 53da1d9456fe7 ("fix ptrace slowness") added a preempt-disable section between read_unlock() and the following schedule() invocation without explaining why it is needed.
Replace the existing contentless comment with a proper explanation to clarify that it is not needed for correctness but for performance reasons.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20230803100932.325870-2-bigeasy@linutronix.de
show more ...
|
#
f5e83688 |
| 13-Jan-2023 |
Ard Biesheuvel <ardb@kernel.org> |
kernel: Drop IA64 support from sig_fault handlers
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
#
b0b88e02 |
| 07-Jul-2023 |
Vincent Whitchurch <vincent.whitchurch@axis.com> |
signal: print comm and exe name on fatal signals
Make the print-fatal-signals message more useful by printing the comm and the exe name for the process which received the fatal signal:
Before:
po
signal: print comm and exe name on fatal signals
Make the print-fatal-signals message more useful by printing the comm and the exe name for the process which received the fatal signal:
Before:
potentially unexpected fatal signal 4 potentially unexpected fatal signal 11
After:
buggy-program: pool: potentially unexpected fatal signal 4 some-daemon: gdbus: potentially unexpected fatal signal 11
comm used to be present but was removed in commit 681a90ffe829b8ee25d ("arc, print-fatal-signals: reduce duplicated information") because it's also included as part of the later stack trace. Having the comm as part of the main "unexpected fatal..." print is rather useful though when analysing logs, and the exe name is also valuable as shown in the examples above where the comm ends up having some generic name like "pool".
[akpm@linux-foundation.org: don't include linux/file.h twice] Link: https://lkml.kernel.org/r/20230707-fatal-comm-v1-1-400363905d5e@axis.com Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vineet Gupta <vgupta@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
5f0bc0b0 |
| 25-Jul-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
mm: suppress mm fault logging if fatal signal already pending
Commit eda0047296a1 ("mm: make the page fault mmap locking killable") intentionally made it much easier to trigger the "page fault fails
mm: suppress mm fault logging if fatal signal already pending
Commit eda0047296a1 ("mm: make the page fault mmap locking killable") intentionally made it much easier to trigger the "page fault fails because a fatal signal is pending" situation, by having the mmap locking fail early in that case.
We have long aborted page faults in other fatal cases when the actual IO for a page is interrupted by SIGKILL - which is particularly useful for the traditional case of NFS hanging due to network issues, but local filesystems could cause it too if you happened to get the SIGKILL while waiting for a page to be faulted in (eg lock_folio_maybe_drop_mmap()).
So aborting the page fault wasn't a new condition - but it now triggers earlier, before we even get to 'handle_mm_fault()'. And as a result the error doesn't go through our 'fault_signal_pending()' logic, and doesn't get filtered away there.
Normally you'd never even notice, because if a fatal signal is pending, the new SIGSEGV we send ends up being ignored anyway.
But it turns out that there is one very noticeable exception: if you enable 'show_unhandled_signals', the aborted page fault will be logged in the kernel messages, and you'll get a scary line looking something like this in your logs:
pverados[2183248]: segfault at 55e5a00f9ae0 ip 000055e5a00f9ae0 sp 00007ffc0720bea8 error 14 in perl[55e5a00d4000+195000] likely on CPU 10 (core 4, socket 0)
which is rather misleading. It's not really a segfault at all, it's just "the thread was killed before the page fault completed, so we aborted the page fault".
Fix this by just making it clear that a pending fatal signal means that any new signal coming in after that is implicitly handled. This will avoid the misleading logging, since now the signal isn't 'unhandled' any more.
Reported-and-tested-by: Fiona Ebner <f.ebner@proxmox.com> Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Link: https://lore.kernel.org/lkml/8d063a26-43f5-0bb7-3203-c6a04dc159f8@proxmox.com/ Acked-by: Oleg Nesterov <oleg@redhat.com> Fixes: eda0047296a1 ("mm: make the page fault mmap locking killable") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
f9010dbd |
| 01-Jun-2023 |
Mike Christie <michael.christie@oracle.com> |
fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
When switching from kthreads to vhost_tasks two bugs were added: 1. The vhost worker tasks's now show up as processes so scripts doing ps o
fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
When switching from kthreads to vhost_tasks two bugs were added: 1. The vhost worker tasks's now show up as processes so scripts doing ps or ps a would not incorrectly detect the vhost task as another process. 2. kthreads disabled freeze by setting PF_NOFREEZE, but vhost tasks's didn't disable or add support for them.
To fix both bugs, this switches the vhost task to be thread in the process that does the VHOST_SET_OWNER ioctl, and has vhost_worker call get_signal to support SIGKILL/SIGSTOP and freeze signals. Note that SIGKILL/STOP support is required because CLONE_THREAD requires CLONE_SIGHAND which requires those 2 signals to be supported.
This is a modified version of the patch written by Mike Christie <michael.christie@oracle.com> which was a modified version of patch originally written by Linus.
Much of what depended upon PF_IO_WORKER now depends on PF_USER_WORKER. Including ignoring signals, setting up the register state, and having get_signal return instead of calling do_group_exit.
Tidied up the vhost_task abstraction so that the definition of vhost_task only needs to be visible inside of vhost_task.c. Making it easier to review the code and tell what needs to be done where. As part of this the main loop has been moved from vhost_worker into vhost_task_fn. vhost_worker now returns true if work was done.
The main loop has been updated to call get_signal which handles SIGSTOP, freezing, and collects the message that tells the thread to exit as part of process exit. This collection clears __fatal_signal_pending. This collection is not guaranteed to clear signal_pending() so clear that explicitly so the schedule() sleeps.
For now the vhost thread continues to exist and run work until the last file descriptor is closed and the release function is called as part of freeing struct file. To avoid hangs in the coredump rendezvous and when killing threads in a multi-threaded exec. The coredump code and de_thread have been modified to ignore vhost threads.
Remvoing the special case for exec appears to require teaching vhost_dev_flush how to directly complete transactions in case the vhost thread is no longer running.
Removing the special case for coredump rendezvous requires either the above fix needed for exec or moving the coredump rendezvous into get_signal.
Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Co-developed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
01e6aac7 |
| 18-May-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
signal: move show_unhandled_signals sysctl to its own file
The show_unhandled_signals sysctl is the only sysctl for debug left on kernel/sysctl.c. We've been moving the syctls out from kernel/sysctl
signal: move show_unhandled_signals sysctl to its own file
The show_unhandled_signals sysctl is the only sysctl for debug left on kernel/sysctl.c. We've been moving the syctls out from kernel/sysctl.c so to help avoid merge conflicts as the shared array gets out of hand.
This change incurs simplifies sysctl registration by localizing it where it should go for a penalty in size of increasing the kernel by 23 bytes, we accept this given recent cleanups have actually already saved us 1465 bytes in the prior commits.
./scripts/bloat-o-meter vmlinux.3-remove-dev-table vmlinux.4-remove-debug-table add/remove: 3/1 grow/shrink: 0/1 up/down: 177/-154 (23) Function old new delta signal_debug_table - 128 +128 init_signal_sysctls - 33 +33 __pfx_init_signal_sysctls - 16 +16 sysctl_init_bases 85 59 -26 debug_table 128 - -128 Total: Before=21256967, After=21256990, chg +0.00%
Reviewed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
show more ...
|
#
bcb7ee79 |
| 16-Mar-2023 |
Dmitry Vyukov <dvyukov@google.com> |
posix-timers: Prefer delivery of signals to the current thread
POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main thread of a thread group for signal delivery. However, this has a
posix-timers: Prefer delivery of signals to the current thread
POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main thread of a thread group for signal delivery. However, this has a significant downside: it requires waking up a potentially idle thread.
Instead, prefer to deliver signals to the current thread (in the same thread group) if SIGEV_THREAD_ID is not set by the user. This does not change guaranteed semantics, since POSIX process CPU time timers have never guaranteed that signal delivery is to a specific thread (without SIGEV_THREAD_ID set).
The effect is that queueing the signal no longer wakes up potentially idle threads, and the kernel is no longer biased towards delivering the timer signal to any particular thread (which better distributes the timer signals esp. when multiple timers fire concurrently).
Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230316123028.2890338-1-elver@google.com
show more ...
|
#
af7f588d |
| 22-Nov-2022 |
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> |
sched: Introduce per-memory-map concurrency ID
This feature allows the scheduler to expose a per-memory map concurrency ID to user-space. This concurrency ID is within the possible cpus range, and i
sched: Introduce per-memory-map concurrency ID
This feature allows the scheduler to expose a per-memory map concurrency ID to user-space. This concurrency ID is within the possible cpus range, and is temporarily (and uniquely) assigned while threads are actively running within a memory map. If a memory map has fewer threads than cores, or is limited to run on few cores concurrently through sched affinity or cgroup cpusets, the concurrency IDs will be values close to 0, thus allowing efficient use of user-space memory for per-cpu data structures.
This feature is meant to be exposed by a new rseq thread area field.
The primary purpose of this feature is to do the heavy-lifting needed by memory allocators to allow them to use per-cpu data structures efficiently in the following situations:
- Single-threaded applications, - Multi-threaded applications on large systems (many cores) with limited cpu affinity mask, - Multi-threaded applications on large systems (many cores) with restricted cgroup cpuset per container.
One of the key concern from scheduler maintainers is the overhead associated with additional spin locks or atomic operations in the scheduler fast-path. This is why the following optimization is implemented.
On context switch between threads belonging to the same memory map, transfer the mm_cid from prev to next without any atomic ops. This takes care of use-cases involving frequent context switch between threads belonging to the same memory map.
Additional optimizations can be done if the spin locks added when context switching between threads belonging to different memory maps end up being a performance bottleneck. Those are left out of this patch though. A performance impact would have to be clearly demonstrated to justify the added complexity.
The credit goes to Paul Turner (Google) for the original virtual cpu id idea. This feature is implemented based on the discussions with Paul Turner and Peter Oskolkov (Google), but I took the liberty to implement scheduler fast-path optimizations and my own NUMA-awareness scheme. The rumor has it that Google have been running a rseq vcpu_id extension internally in production for a year. The tcmalloc source code indeed has comments hinting at a vcpu_id prototype extension to the rseq system call [1].
The following benchmarks do not show any significant overhead added to the scheduler context switch by this feature:
* perf bench sched messaging (process)
Baseline: 86.5±0.3 ms With mm_cid: 86.7±2.6 ms
* perf bench sched messaging (threaded)
Baseline: 84.3±3.0 ms With mm_cid: 84.7±2.6 ms
* hackbench (process)
Baseline: 82.9±2.7 ms With mm_cid: 82.9±2.9 ms
* hackbench (threaded)
Baseline: 85.2±2.6 ms With mm_cid: 84.4±2.9 ms
[1] https://github.com/google/tcmalloc/blob/master/tcmalloc/internal/linux_syscall_support.h#L26
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221122203932.231377-8-mathieu.desnoyers@efficios.com
show more ...
|
#
3a017d63 |
| 28-Nov-2022 |
haifeng.xu <haifeng.xu@shopee.com> |
signal: Initialize the info in ksignal
When handing the SIGNAL_GROUP_EXIT flag, the info in ksignal isn't cleared. However, the info acquired by dequeue_synchronous_signal/dequeue_signal is initiali
signal: Initialize the info in ksignal
When handing the SIGNAL_GROUP_EXIT flag, the info in ksignal isn't cleared. However, the info acquired by dequeue_synchronous_signal/dequeue_signal is initialized and can be safely used. Fortunately, the fatal signal process just uses the si_signo and doesn't use any other member. Even so, the initialization before use is more safer.
Signed-off-by: haifeng.xu <haifeng.xu@shopee.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221128065606.19570-1-haifeng.xu@shopee.com
show more ...
|
#
6a542d1d |
| 08-Jun-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
kill signal_pt_regs()
Once upon at it was used on hot paths, but that had not been true since 2013. IOW, there's no point for arch-optimized equivalent of task_pt_regs(current) - remaining two user
kill signal_pt_regs()
Once upon at it was used on hot paths, but that had not been true since 2013. IOW, there's no point for arch-optimized equivalent of task_pt_regs(current) - remaining two users are not worth bothering with.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|