#
eb522866 |
| 06-Mar-2024 |
Gang Li Subject: padata: dispatch works on <gang.li@linux.dev> |
Author: Gang Li padata: dispatch works on
different nodes Date: Thu, 22 Feb 2024 22:04:17 +0800
When a group of tasks that access different nodes are scheduled on the same node, they may encounter
Author: Gang Li padata: dispatch works on
different nodes Date: Thu, 22 Feb 2024 22:04:17 +0800
When a group of tasks that access different nodes are scheduled on the same node, they may encounter bandwidth bottlenecks and access latency.
Thus, numa_aware flag is introduced here, allowing tasks to be distributed across different nodes to fully utilize the advantage of multi-node systems.
Link: https://lkml.kernel.org/r/20240222140422.393911-5-gang.li@linux.dev Signed-off-by: Gang Li <ligang.bdlg@bytedance.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
7ddc21e3 |
| 16-Oct-2023 |
WangJinchao <wangjinchao@xfusion.com> |
padata: Fix refcnt handling in padata_free_shell()
In a high-load arm64 environment, the pcrypt_aead01 test in LTP can lead to system UAF (Use-After-Free) issues. Due to the lengthy analysis of the
padata: Fix refcnt handling in padata_free_shell()
In a high-load arm64 environment, the pcrypt_aead01 test in LTP can lead to system UAF (Use-After-Free) issues. Due to the lengthy analysis of the pcrypt_aead01 function call, I'll describe the problem scenario using a simplified model:
Suppose there's a user of padata named `user_function` that adheres to the padata requirement of calling `padata_free_shell` after `serial()` has been invoked, as demonstrated in the following code:
```c struct request { struct padata_priv padata; struct completion *done; };
void parallel(struct padata_priv *padata) { do_something(); }
void serial(struct padata_priv *padata) { struct request *request = container_of(padata, struct request, padata); complete(request->done); }
void user_function() { DECLARE_COMPLETION(done) padata->parallel = parallel; padata->serial = serial; padata_do_parallel(); wait_for_completion(&done); padata_free_shell(); } ```
In the corresponding padata.c file, there's the following code:
```c static void padata_serial_worker(struct work_struct *serial_work) { ... cnt = 0;
while (!list_empty(&local_list)) { ... padata->serial(padata); cnt++; }
local_bh_enable();
if (refcount_sub_and_test(cnt, &pd->refcnt)) padata_free_pd(pd); } ```
Because of the high system load and the accumulation of unexecuted softirq at this moment, `local_bh_enable()` in padata takes longer to execute than usual. Subsequently, when accessing `pd->refcnt`, `pd` has already been released by `padata_free_shell()`, resulting in a UAF issue with `pd->refcnt`.
The fix is straightforward: add `refcount_dec_and_test` before calling `padata_free_pd` in `padata_free_shell`.
Fixes: 07928d9bfc81 ("padata: Remove broken queue flushing")
Signed-off-by: WangJinchao <wangjinchao@xfusion.com> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
8f4f68e7 |
| 04-Sep-2023 |
Lu Jialin <lujialin4@huawei.com> |
crypto: pcrypt - Fix hungtask for PADATA_RESET
We found a hungtask bug in test_aead_vec_cfg as follows:
INFO: task cryptomgr_test:391009 blocked for more than 120 seconds. "echo 0 > /proc/sys/kerne
crypto: pcrypt - Fix hungtask for PADATA_RESET
We found a hungtask bug in test_aead_vec_cfg as follows:
INFO: task cryptomgr_test:391009 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Call trace: __switch_to+0x98/0xe0 __schedule+0x6c4/0xf40 schedule+0xd8/0x1b4 schedule_timeout+0x474/0x560 wait_for_common+0x368/0x4e0 wait_for_completion+0x20/0x30 wait_for_completion+0x20/0x30 test_aead_vec_cfg+0xab4/0xd50 test_aead+0x144/0x1f0 alg_test_aead+0xd8/0x1e0 alg_test+0x634/0x890 cryptomgr_test+0x40/0x70 kthread+0x1e0/0x220 ret_from_fork+0x10/0x18 Kernel panic - not syncing: hung_task: blocked tasks
For padata_do_parallel, when the return err is 0 or -EBUSY, it will call wait_for_completion(&wait->completion) in test_aead_vec_cfg. In normal case, aead_request_complete() will be called in pcrypt_aead_serial and the return err is 0 for padata_do_parallel. But, when pinst->flags is PADATA_RESET, the return err is -EBUSY for padata_do_parallel, and it won't call aead_request_complete(). Therefore, test_aead_vec_cfg will hung at wait_for_completion(&wait->completion), which will cause hungtask.
The problem comes as following: (padata_do_parallel) | rcu_read_lock_bh(); | err = -EINVAL; | (padata_replace) | pinst->flags |= PADATA_RESET; err = -EBUSY | if (pinst->flags & PADATA_RESET) | rcu_read_unlock_bh() | return err
In order to resolve the problem, we replace the return err -EBUSY with -EAGAIN, which means parallel_data is changing, and the caller should call it again.
v3: remove retry and just change the return err. v2: introduce padata_try_do_parallel() in pcrypt_aead_encrypt and pcrypt_aead_decrypt to solve the hungtask.
Signed-off-by: Lu Jialin <lujialin4@huawei.com> Signed-off-by: Guo Zihua <guozihua@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
f84155ca |
| 23-Feb-2023 |
Anthony Yznaga <anthony.yznaga@oracle.com> |
padata: use alignment when calculating the number of worker threads
For multithreaded jobs the computed chunk size is rounded up by the caller-specified alignment. However, the number of worker thre
padata: use alignment when calculating the number of worker threads
For multithreaded jobs the computed chunk size is rounded up by the caller-specified alignment. However, the number of worker threads to use is computed using the minimum chunk size without taking alignment into account. A sufficiently large alignment value can result in too many worker threads being allocated for the job.
Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
0bedc992 |
| 17-Feb-2023 |
Thomas Weißschuh <linux@weissschuh.net> |
padata: Make kobj_type structure constant
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type.
Take advantage of this to c
padata: Make kobj_type structure constant
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type.
Take advantage of this to constify the structure definition to prevent modification at runtime.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
0d24f1b7 |
| 13-Dec-2022 |
Nathan Chancellor <nathan@kernel.org> |
padata: Mark padata_work_init() as __ref
When building arm64 allmodconfig + ThinLTO with clang and a proposed modpost update to account for -ffuncton-sections, the following warning appears:
WARN
padata: Mark padata_work_init() as __ref
When building arm64 allmodconfig + ThinLTO with clang and a proposed modpost update to account for -ffuncton-sections, the following warning appears:
WARNING: modpost: vmlinux.o: section mismatch in reference: padata_work_init (section: .text.padata_work_init) -> padata_mt_helper (section: .init.text) WARNING: modpost: vmlinux.o: section mismatch in reference: padata_work_init (section: .text.padata_work_init) -> padata_mt_helper (section: .init.text)
LLVM has optimized padata_work_init() to include the address of padata_mt_helper() directly because it inlined the other call to padata_work_init() with padata_parallel_worker(), meaning the remaining uses of padata_work_init() use padata_mt_helper() as the work_fn argument. This optimization causes modpost to complain since padata_work_init() is not __init, whereas padata_mt_helper() is.
Since padata_work_init() is only called from __init code when padata_mt_helper() is passed as the work_fn argument, mark padata_work_init() as __ref, which makes it clear to modpost that this scenario is okay.
Suggested-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
show more ...
|
#
57ddfecc |
| 17-Nov-2022 |
Daniel Jordan <daniel.m.jordan@oracle.com> |
padata: Fix list iterator in padata_do_serial()
list_for_each_entry_reverse() assumes that the iterated list is nonempty and that every list_head is embedded in the same type, but its use in padata_
padata: Fix list iterator in padata_do_serial()
list_for_each_entry_reverse() assumes that the iterated list is nonempty and that every list_head is embedded in the same type, but its use in padata_do_serial() breaks both rules.
This doesn't cause any issues now because padata_priv and padata_list happen to have their list fields at the same offset, but we really shouldn't be relying on that.
Fixes: bfde23ce200e ("padata: unbind parallel jobs from specific CPUs") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
34c3a47d |
| 17-Nov-2022 |
Daniel Jordan <daniel.m.jordan@oracle.com> |
padata: Always leave BHs disabled when running ->parallel()
A deadlock can happen when an overloaded system runs ->parallel() in the context of the current task:
padata_do_parallel ->para
padata: Always leave BHs disabled when running ->parallel()
A deadlock can happen when an overloaded system runs ->parallel() in the context of the current task:
padata_do_parallel ->parallel() pcrypt_aead_enc/dec padata_do_serial spin_lock(&reorder->lock) // BHs still enabled <interrupt> ... __do_softirq ... padata_do_serial spin_lock(&reorder->lock)
It's a bug for BHs to be on in _do_serial as Steffen points out, so ensure they're off in the "current task" case like they are in padata_parallel_worker to avoid this situation.
Reported-by: syzbot+bc05445bc14148d51915@syzkaller.appspotmail.com Fixes: 4611ce224688 ("padata: allocate work structures for parallel jobs from a pool") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
1c4cafd1 |
| 23-Jan-2022 |
Yury Norov <yury.norov@gmail.com> |
padata: replace cpumask_weight with cpumask_empty in padata.c
padata_do_parallel() calls cpumask_weight() to check if any bit of a given cpumask is set. We can do it more efficiently with cpumask_em
padata: replace cpumask_weight with cpumask_empty in padata.c
padata_do_parallel() calls cpumask_weight() to check if any bit of a given cpumask is set. We can do it more efficiently with cpumask_empty() because cpumask_empty() stops traversing the cpumask as soon as it finds first set bit, while cpumask_weight() counts all bits unconditionally.
Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
cedcf527 |
| 22-Aug-2021 |
Cai Huoqing <caihuoqing@baidu.com> |
padata: Remove repeated verbose license text
remove it because SPDX-License-Identifier is already used
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Acked-by: Daniel Jordan <daniel.m.jordan@ora
padata: Remove repeated verbose license text
remove it because SPDX-License-Identifier is already used
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
80771c82 |
| 03-Aug-2021 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
padata: Replace deprecated CPU-hotplug functions.
The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and
padata: Replace deprecated CPU-hotplug functions.
The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock().
Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged.
Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: linux-crypto@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
d5ee8e75 |
| 20-Jul-2021 |
Xiyu Yang <xiyuyang19@fudan.edu.cn> |
padata: Convert from atomic_t to refcount_t on parallel_data->refcnt
refcount_t type and corresponding API can protect refcounters from accidental underflow and overflow and further use-after-free s
padata: Convert from atomic_t to refcount_t on parallel_data->refcnt
refcount_t type and corresponding API can protect refcounters from accidental underflow and overflow and further use-after-free situations.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
1b0df11f |
| 02-Sep-2020 |
Daniel Jordan <daniel.m.jordan@oracle.com> |
padata: fix possible padata_works_lock deadlock
syzbot reports,
WARNING: inconsistent lock state 5.9.0-rc2-syzkaller #0 Not tainted -------------------------------- inconsistent {IN-SOFTIRQ
padata: fix possible padata_works_lock deadlock
syzbot reports,
WARNING: inconsistent lock state 5.9.0-rc2-syzkaller #0 Not tainted -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. syz-executor.0/26715 takes: (padata_works_lock){+.?.}-{2:2}, at: padata_do_parallel kernel/padata.c:220 {IN-SOFTIRQ-W} state was registered at: spin_lock include/linux/spinlock.h:354 [inline] padata_do_parallel kernel/padata.c:220 ... __do_softirq kernel/softirq.c:298 ... sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1091 asm_sysvec_apic_timer_interrupt arch/x86/include/asm/idtentry.h:581
Possible unsafe locking scenario:
CPU0 ---- lock(padata_works_lock); <Interrupt> lock(padata_works_lock);
padata_do_parallel() takes padata_works_lock with softirqs enabled, so a deadlock is possible if, on the same CPU, the lock is acquired in process context and then softirq handling done in an interrupt leads to the same path.
Fix by leaving softirqs disabled while do_parallel holds padata_works_lock.
Reported-by: syzbot+f4b9f49e38e25eb4ef52@syzkaller.appspotmail.com Fixes: 4611ce2246889 ("padata: allocate work structures for parallel jobs from a pool") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
f601c725 |
| 14-Jul-2020 |
Daniel Jordan <daniel.m.jordan@oracle.com> |
padata: remove padata_parallel_queue
Only its reorder field is actually used now, so remove the struct and embed @reorder directly in parallel_data.
No functional change, just a cleanup.
Signed-of
padata: remove padata_parallel_queue
Only its reorder field is actually used now, so remove the struct and embed @reorder directly in parallel_data.
No functional change, just a cleanup.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
3f257191 |
| 14-Jul-2020 |
Daniel Jordan <daniel.m.jordan@oracle.com> |
padata: fold padata_alloc_possible() into padata_alloc()
There's no reason to have two interfaces when there's only one caller. Removing _possible saves text and simplifies future changes.
Signed-o
padata: fold padata_alloc_possible() into padata_alloc()
There's no reason to have two interfaces when there's only one caller. Removing _possible saves text and simplifies future changes.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
d69e037b |
| 14-Jul-2020 |
Daniel Jordan <daniel.m.jordan@oracle.com> |
padata: remove effective cpumasks from the instance
A padata instance has effective cpumasks that store the user-supplied masks ANDed with the online mask, but this middleman is unnecessary. paralle
padata: remove effective cpumasks from the instance
A padata instance has effective cpumasks that store the user-supplied masks ANDed with the online mask, but this middleman is unnecessary. parallel_data keeps the same information around. Removing this saves text and code churn in future changes.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
cec00e6e |
| 14-Jul-2020 |
Daniel Jordan <daniel.m.jordan@oracle.com> |
padata: inline single call of pd_setup_cpumasks()
pd_setup_cpumasks() has only one caller. Move its contents inline to prepare for the next cleanup.
Signed-off-by: Daniel Jordan <daniel.m.jordan@o
padata: inline single call of pd_setup_cpumasks()
pd_setup_cpumasks() has only one caller. Move its contents inline to prepare for the next cleanup.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
350ef051 |
| 14-Jul-2020 |
Daniel Jordan <daniel.m.jordan@oracle.com> |
padata: remove stop function
padata_stop() has two callers and is unnecessary in both cases. When pcrypt calls it before padata_free(), it's being unloaded so there are no outstanding padata jobs[0
padata: remove stop function
padata_stop() has two callers and is unnecessary in both cases. When pcrypt calls it before padata_free(), it's being unloaded so there are no outstanding padata jobs[0]. When __padata_free() calls it, it's either along the same path or else pcrypt initialization failed, which of course means there are also no outstanding jobs.
Removing it simplifies padata and saves text.
[0] https://lore.kernel.org/linux-crypto/20191119225017.mjrak2fwa5vccazl@gondor.apana.org.au/
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
bd25b488 |
| 14-Jul-2020 |
Daniel Jordan <daniel.m.jordan@oracle.com> |
padata: remove start function
padata_start() is only used right after pcrypt allocates an instance with all possible CPUs, when PADATA_INVALID can't happen, so there's no need for a separate "start"
padata: remove start function
padata_start() is only used right after pcrypt allocates an instance with all possible CPUs, when PADATA_INVALID can't happen, so there's no need for a separate "start" step. It can be done during allocation to save text, make using padata easier, and avoid unneeded calls in the future.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
e04ec0de |
| 08-Jun-2020 |
Daniel Jordan <daniel.m.jordan@oracle.com> |
padata: upgrade smp_mb__after_atomic to smp_mb in padata_do_serial
A 5.7 kernel hangs during a tcrypt test of padata that waits for an AEAD request to finish. This is only seen on large machines ru
padata: upgrade smp_mb__after_atomic to smp_mb in padata_do_serial
A 5.7 kernel hangs during a tcrypt test of padata that waits for an AEAD request to finish. This is only seen on large machines running many concurrent requests.
The issue is that padata never serializes the request. The removal of the reorder_objects atomic missed that the memory barrier in padata_do_serial() depends on it.
Upgrade the barrier from smp_mb__after_atomic to smp_mb to get correct ordering again.
Fixes: 3facced7aeed1 ("padata: remove reorder_objects") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: linux-kernel@vger.kernel.org Cc: <stable@vger.kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
004ed426 |
| 03-Jun-2020 |
Daniel Jordan <daniel.m.jordan@oracle.com> |
padata: add basic support for multithreaded jobs
Sometimes the kernel doesn't take full advantage of system memory bandwidth, leading to a single CPU spending excessive time in initialization paths
padata: add basic support for multithreaded jobs
Sometimes the kernel doesn't take full advantage of system memory bandwidth, leading to a single CPU spending excessive time in initialization paths where the data scales with memory size.
Multithreading naturally addresses this problem.
Extend padata, a framework that handles many parallel yet singlethreaded jobs, to also handle multithreaded jobs by adding support for splitting up the work evenly, specifying a minimum amount of work that's appropriate for one helper thread to do, load balancing between helpers, and coordinating them.
This is inspired by work from Pavel Tatashin and Steve Sistare.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Josh Triplett <josh@joshtriplett.org> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Robert Elliott <elliott@hpe.com> Cc: Shile Zhang <shile.zhang@linux.alibaba.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Link: http://lkml.kernel.org/r/20200527173608.2885243-5-daniel.m.jordan@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
4611ce22 |
| 03-Jun-2020 |
Daniel Jordan <daniel.m.jordan@oracle.com> |
padata: allocate work structures for parallel jobs from a pool
padata allocates per-CPU, per-instance work structs for parallel jobs. A do_parallel call assigns a job to a sequence number and hashe
padata: allocate work structures for parallel jobs from a pool
padata allocates per-CPU, per-instance work structs for parallel jobs. A do_parallel call assigns a job to a sequence number and hashes the number to a CPU, where the job will eventually run using the corresponding work.
This approach fit with how padata used to bind a job to each CPU round-robin, makes less sense after commit bfde23ce200e6 ("padata: unbind parallel jobs from specific CPUs") because a work isn't bound to a particular CPU anymore, and isn't needed at all for multithreaded jobs because they don't have sequence numbers.
Replace the per-CPU works with a preallocated pool, which allows sharing them between existing padata users and the upcoming multithreaded user. The pool will also facilitate setting NUMA-aware concurrency limits with later users.
The pool is sized according to the number of possible CPUs. With this limit, MAX_OBJ_NUM no longer makes sense, so remove it.
If the global pool is exhausted, a parallel job is run in the current task instead to throttle a system trying to do too much in parallel.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Josh Triplett <josh@joshtriplett.org> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Robert Elliott <elliott@hpe.com> Cc: Shile Zhang <shile.zhang@linux.alibaba.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Link: http://lkml.kernel.org/r/20200527173608.2885243-4-daniel.m.jordan@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
f1b192b1 |
| 03-Jun-2020 |
Daniel Jordan <daniel.m.jordan@oracle.com> |
padata: initialize earlier
padata will soon initialize the system's struct pages in parallel, so it needs to be ready by page_alloc_init_late().
The error return from padata_driver_init() triggers
padata: initialize earlier
padata will soon initialize the system's struct pages in parallel, so it needs to be ready by page_alloc_init_late().
The error return from padata_driver_init() triggers an initcall warning, so add a warning to padata_init() to avoid silent failure.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Josh Triplett <josh@joshtriplett.org> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Robert Elliott <elliott@hpe.com> Cc: Shile Zhang <shile.zhang@linux.alibaba.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Link: http://lkml.kernel.org/r/20200527173608.2885243-3-daniel.m.jordan@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
305dacf7 |
| 03-Jun-2020 |
Daniel Jordan <daniel.m.jordan@oracle.com> |
padata: remove exit routine
Patch series "padata: parallelize deferred page init", v3.
Deferred struct page init is a bottleneck in kernel boot--the biggest for us and probably others. Optimizing
padata: remove exit routine
Patch series "padata: parallelize deferred page init", v3.
Deferred struct page init is a bottleneck in kernel boot--the biggest for us and probably others. Optimizing it maximizes availability for large-memory systems and allows spinning up short-lived VMs as needed without having to leave them running. It also benefits bare metal machines hosting VMs that are sensitive to downtime. In projects such as VMM Fast Restart[1], where guest state is preserved across kexec reboot, it helps prevent application and network timeouts in the guests.
So, multithread deferred init to take full advantage of system memory bandwidth.
Extend padata, a framework that handles many parallel singlethreaded jobs, to handle multithreaded jobs as well by adding support for splitting up the work evenly, specifying a minimum amount of work that's appropriate for one helper thread to do, load balancing between helpers, and coordinating them. More documentation in patches 4 and 8.
This series is the first step in a project to address other memory proportional bottlenecks in the kernel such as pmem struct page init, vfio page pinning, hugetlb fallocate, and munmap. Deferred page init doesn't require concurrency limits, resource control, or priority adjustments like these other users will because it happens during boot when the system is otherwise idle and waiting for page init to finish.
This has been run on a variety of x86 systems and speeds up kernel boot by 4% to 49%, saving up to 1.6 out of 4 seconds. Patch 6 has more numbers.
This patch (of 8):
padata_driver_exit() is unnecessary because padata isn't built as a module and doesn't exit.
padata's init routine will soon allocate memory, so getting rid of the exit function now avoids pointless code to free it.
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Josh Triplett <josh@joshtriplett.org> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Robert Elliott <elliott@hpe.com> Cc: Shile Zhang <shile.zhang@linux.alibaba.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Link: http://lkml.kernel.org/r/20200527173608.2885243-1-daniel.m.jordan@oracle.com Link: http://lkml.kernel.org/r/20200527173608.2885243-2-daniel.m.jordan@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
3c2214b6 |
| 21-Apr-2020 |
Daniel Jordan <daniel.m.jordan@oracle.com> |
padata: add separate cpuhp node for CPUHP_PADATA_DEAD
Removing the pcrypt module triggers this:
general protection fault, probably for non-canonical address 0xdead000000000122 CPU: 5 PID: 2
padata: add separate cpuhp node for CPUHP_PADATA_DEAD
Removing the pcrypt module triggers this:
general protection fault, probably for non-canonical address 0xdead000000000122 CPU: 5 PID: 264 Comm: modprobe Not tainted 5.6.0+ #2 Hardware name: QEMU Standard PC RIP: 0010:__cpuhp_state_remove_instance+0xcc/0x120 Call Trace: padata_sysfs_release+0x74/0xce kobject_put+0x81/0xd0 padata_free+0x12/0x20 pcrypt_exit+0x43/0x8ee [pcrypt]
padata instances wrongly use the same hlist node for the online and dead states, so __padata_free()'s second cpuhp remove call chokes on the node that the first poisoned.
cpuhp multi-instance callbacks only walk forward in cpuhp_step->list and the same node is linked in both the online and dead lists, so the list corruption that results from padata_alloc() adding the node to a second list without removing it from the first doesn't cause problems as long as no instances are freed.
Avoid the issue by giving each state its own node.
Fixes: 894c9ef9780c ("padata: validate cpumask without removed CPU during offline") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|