#
f7345ccc |
| 10-Oct-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Fix rcuog wake-up from offline softirq
After a CPU has set itself offline and before it eventually calls rcutree_report_cpu_dead(), there are still opportunities for callbacks to be enqueu
rcu/nocb: Fix rcuog wake-up from offline softirq
After a CPU has set itself offline and before it eventually calls rcutree_report_cpu_dead(), there are still opportunities for callbacks to be enqueued, for example from a softirq. When that happens on NOCB, the rcuog wake-up is deferred through an IPI to an online CPU in order not to call into the scheduler and risk arming the RT-bandwidth after hrtimers have been migrated out and disabled.
But performing a synchronized IPI from a softirq is buggy as reported in the following scenario:
WARNING: CPU: 1 PID: 26 at kernel/smp.c:633 smp_call_function_single Modules linked in: rcutorture torture CPU: 1 UID: 0 PID: 26 Comm: migration/1 Not tainted 6.11.0-rc1-00012-g9139f93209d1 #1 Stopper: multi_cpu_stop+0x0/0x320 <- __stop_cpus+0xd0/0x120 RIP: 0010:smp_call_function_single <IRQ> swake_up_one_online __call_rcu_nocb_wake __call_rcu_common ? rcu_torture_one_read call_timer_fn __run_timers run_timer_softirq handle_softirqs irq_exit_rcu ? tick_handle_periodic sysvec_apic_timer_interrupt </IRQ>
Fix this with forcing deferred rcuog wake up through the NOCB timer when the CPU is offline. The actual wake up will happen from rcutree_report_cpu_dead().
Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202409231644.4c55582d-lkp@intel.com Fixes: 9139f93209d1 ("rcu/nocb: Fix RT throttling hrtimer armed from offline CPU") Reviewed-by: "Joel Fernandes (Google)" <joel@joelfernandes.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
7562eed2 |
| 13-Aug-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Remove superfluous memory barrier after bypass enqueue
Pre-GP accesses performed by the update side must be ordered against post-GP accesses performed by the readers. This is ensured by th
rcu/nocb: Remove superfluous memory barrier after bypass enqueue
Pre-GP accesses performed by the update side must be ordered against post-GP accesses performed by the readers. This is ensured by the bypass or nocb locking on enqueue time, followed by the fully ordered rnp locking initiated while callbacks are accelerated, and then propagated throughout the whole GP lifecyle associated with the callbacks.
Therefore the explicit barrier advertizing ordering between bypass enqueue and rcuo wakeup is superfluous. If anything, it would even only order the first bypass callback enqueue against the rcuo wakeup and ignore all the subsequent ones.
Remove the needless barrier.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
1b022b87 |
| 13-Aug-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Conditionally wake up rcuo if not already waiting on GP
A callback enqueuer currently wakes up the rcuo kthread if it is adding the first non-done callback of a CPU, whether the kthread is
rcu/nocb: Conditionally wake up rcuo if not already waiting on GP
A callback enqueuer currently wakes up the rcuo kthread if it is adding the first non-done callback of a CPU, whether the kthread is waiting on a grace period or not (unless the CPU is offline).
This looks like a desired behaviour because then the rcuo kthread doesn't wait for the end of the current grace period to handle the callback. It is accelerated right away and assigned to the next grace period. The GP kthread is notified about that fact and iterates with the upcoming GP without sleeping in-between.
However this best-case scenario is contradicted by a few details, depending on the situation:
1) If the callback is a non-bypass one queued with IRQs enabled, the wake up only occurs if no other pending callbacks are on the list. Therefore the theoretical "optimization" actually applies on rare occasions.
2) If the callback is a non-bypass one queued with IRQs disabled, the situation is similar with even more uncertainty due to the deferred wake up.
3) If the callback is lazy, a few jiffies don't make any difference.
4) If the callback is bypass, the wake up timer is programmed 2 jiffies ahead by rcuo in case the regular pending queue has been handled in the meantime. The rare storm of callbacks can otherwise wait for the currently elapsing grace period to be flushed and handled.
For all those reasons, the optimization is only theoretical and occasional. Therefore it is reasonable that callbacks enqueuers only wake up the rcuo kthread when it is not already waiting on a grace period to complete.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
9139f932 |
| 13-Aug-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Fix RT throttling hrtimer armed from offline CPU
After a CPU is marked offline and until it reaches its final trip to idle, rcuo has several opportunities to be woken up, either because a
rcu/nocb: Fix RT throttling hrtimer armed from offline CPU
After a CPU is marked offline and until it reaches its final trip to idle, rcuo has several opportunities to be woken up, either because a callback has been queued in the meantime or because rcutree_report_cpu_dead() has issued the final deferred NOCB wake up.
If RCU-boosting is enabled, RCU kthreads are set to SCHED_FIFO policy. And if RT-bandwidth is enabled, the related hrtimer might be armed. However this then happens after hrtimers have been migrated at the CPUHP_AP_HRTIMERS_DYING stage, which is broken as reported by the following warning:
Call trace: enqueue_hrtimer+0x7c/0xf8 hrtimer_start_range_ns+0x2b8/0x300 enqueue_task_rt+0x298/0x3f0 enqueue_task+0x94/0x188 ttwu_do_activate+0xb4/0x27c try_to_wake_up+0x2d8/0x79c wake_up_process+0x18/0x28 __wake_nocb_gp+0x80/0x1a0 do_nocb_deferred_wakeup_common+0x3c/0xcc rcu_report_dead+0x68/0x1ac cpuhp_report_idle_dead+0x48/0x9c do_idle+0x288/0x294 cpu_startup_entry+0x34/0x3c secondary_start_kernel+0x138/0x158
Fix this with waking up rcuo using an IPI if necessary. Since the existing API to deal with this situation only handles swait queue, rcuo is only woken up from offline CPUs if it's not already waiting on a grace period. In the worst case some callbacks will just wait for a grace period to complete before being assigned to a subsequent one.
Reported-by: "Cheng-Jui Wang (王正睿)" <Cheng-Jui.Wang@mediatek.com> Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
1fcb932c |
| 03-Jul-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Simplify (de-)offloading state machine
Now that the (de-)offloading process can only apply to offline CPUs, there is no more concurrency between rcu_core and nocb kthreads. Also the mutati
rcu/nocb: Simplify (de-)offloading state machine
Now that the (de-)offloading process can only apply to offline CPUs, there is no more concurrency between rcu_core and nocb kthreads. Also the mutation now happens on empty queues.
Therefore the state machine can be reduced to a single bit called SEGCBLIST_OFFLOADED. Simplify the transition as follows:
* Upon offloading: queue the rdp to be added to the rcuog list and wait for the rcuog kthread to set the SEGCBLIST_OFFLOADED bit. Unpark rcuo kthread.
* Upon de-offloading: Park rcuo kthread. Queue the rdp to be removed from the rcuog list and wait for the rcuog kthread to clear the SEGCBLIST_OFFLOADED bit.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
32a9f26e |
| 29-Apr-2024 |
Valentin Schneider <vschneid@redhat.com> |
rcu: Rename rcu_momentary_dyntick_idle() into rcu_momentary_eqs()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, replace "dyntick_idle" into "eqs" to drop the
rcu: Rename rcu_momentary_dyntick_idle() into rcu_momentary_eqs()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, replace "dyntick_idle" into "eqs" to drop the dyntick reference.
Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
bae6076e |
| 30-May-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Remove SEGCBLIST_RCU_CORE
RCU core can't be running anymore while in the middle of (de-)offloading since this sort of transition now only applies to offline CPUs.
The SEGCBLIST_RCU_CORE s
rcu/nocb: Remove SEGCBLIST_RCU_CORE
RCU core can't be running anymore while in the middle of (de-)offloading since this sort of transition now only applies to offline CPUs.
The SEGCBLIST_RCU_CORE state can therefore be removed.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
d2e7f55f |
| 30-May-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Remove halfway (de-)offloading handling from bypass
Bypass enqueue can't happen anymore in the middle of (de-)offloading since this sort of transition now only applies to offline CPUs.
Th
rcu/nocb: Remove halfway (de-)offloading handling from bypass
Bypass enqueue can't happen anymore in the middle of (de-)offloading since this sort of transition now only applies to offline CPUs.
The related safety check can therefore be removed.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
4e26ce42 |
| 30-May-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: (De-)offload callbacks on offline CPUs only
Currently callbacks can be (de-)offloaded only on online CPUs. This involves an overly elaborated state machine in order to make sure that callb
rcu/nocb: (De-)offload callbacks on offline CPUs only
Currently callbacks can be (de-)offloaded only on online CPUs. This involves an overly elaborated state machine in order to make sure that callbacks are always handled during the process while ensuring synchronization between rcu_core and NOCB kthreads.
The only potential user of NOCB (de-)offloading appears to be a nohz_full toggling interface through cpusets. And the general agreement is now to work toward toggling the nohz_full state on offline CPUs to simplify the whole picture.
Therefore, convert the (de-)offloading to only support offline CPUs. This involves the following changes:
* Call rcu_barrier() before deoffloading. An offline offloaded CPU may still carry callbacks in its queue ignored by rcutree_migrate_callbacks(). Those callbacks must all be flushed before switching to a regular queue because no more kthreads will handle those before the CPU ever gets re-onlined.
This means that further calls to rcu_barrier() will find an empty queue until the CPU goes through rcutree_report_cpu_starting(). As a result it is guaranteed that further rcu_barrier() won't try to lock the nocb_lock for that target and thus won't risk an imbalance.
Therefore barrier_mutex doesn't need to be locked anymore upon deoffloading.
* Assume the queue is empty before offloading, as rcutree_migrate_callbacks() took care of everything.
This means that further calls to rcu_barrier() will find an empty queue until the CPU goes through rcutree_report_cpu_starting(). As a result it is guaranteed that further rcu_barrier() won't risk a nocb_lock imbalance.
Therefore barrier_mutex doesn't need to be locked anymore upon offloading.
* No need to flush bypass anymore.
Further simplifications will follow in upcoming patches.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
7121dd91 |
| 30-May-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Introduce nocb mutex
The barrier_mutex is used currently to protect (de-)offloading operations and prevent from nocb_lock locking imbalance in rcu_barrier() and shrinker, and also from mis
rcu/nocb: Introduce nocb mutex
The barrier_mutex is used currently to protect (de-)offloading operations and prevent from nocb_lock locking imbalance in rcu_barrier() and shrinker, and also from misordered RCU barrier invocation.
Now since RCU (de-)offloading is going to happen on offline CPUs, an RCU barrier will have to be executed while transitionning from offloaded to de-offloaded state. And this can't happen while holding the barrier_mutex.
Introduce a NOCB mutex to protect (de-)offloading transitions. The barrier_mutex is still held for now when necessary to avoid barrier callbacks reordering and nocb_lock imbalance.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
7be88a85 |
| 30-May-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Assert no callbacks while nocb kthread allocation fails
When a NOCB CPU fails to create a nocb kthread on bringup, the CPU is then deoffloaded. The barrier mutex is locked at this stage. I
rcu/nocb: Assert no callbacks while nocb kthread allocation fails
When a NOCB CPU fails to create a nocb kthread on bringup, the CPU is then deoffloaded. The barrier mutex is locked at this stage. It is typically used to protect against concurrent (de-)offloading and/or concurrent rcu_barrier() that would otherwise risk a nocb locking imbalance. However:
* rcu_barrier() can't run concurrently if it's the boot CPU on early boot-up.
* rcu_barrier() can run concurrently if it's a secondary CPU but it is expected to see 0 callbacks on this target because it's the first time it boots.
* (de-)offloading can't happen concurrently with smp_init(), as rcutorture is initialized later, at least not before device_initcall(), and userspace isn't available yet.
* (de-)offloading can't happen concurrently with cpu_up(), courtesy of cpu_hotplug_lock.
But:
* The lazy shrinker might run concurrently with cpu_up(). It shouldn't try to grab the nocb_lock and risk an imbalance due to lazy_len supposed to be 0 but be extra cautious.
* Also be cautious against resume from hibernation potential subtleties.
So keep the locking and add some assertions and comments.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
7aeba709 |
| 30-May-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Introduce RCU_NOCB_LOCKDEP_WARN()
Checking for races against concurrent (de-)offloading implies the creation of !CONFIG_RCU_NOCB_CPU stubs to check if each relevant lock is held. For now t
rcu/nocb: Introduce RCU_NOCB_LOCKDEP_WARN()
Checking for races against concurrent (de-)offloading implies the creation of !CONFIG_RCU_NOCB_CPU stubs to check if each relevant lock is held. For now this only implies the nocb_lock but more are to be expected.
Create instead a NOCB specific version of RCU_LOCKDEP_WARN() to avoid the proliferation of stubs.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
e4f78057 |
| 25-Apr-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Remove buggy bypass lock contention mitigation
The bypass lock contention mitigation assumes there can be at most 2 contenders on the bypass lock, following this scheme:
1) One kthread ta
rcu/nocb: Remove buggy bypass lock contention mitigation
The bypass lock contention mitigation assumes there can be at most 2 contenders on the bypass lock, following this scheme:
1) One kthread takes the bypass lock 2) Another one spins on it and increment the contended counter 3) A third one (a bypass enqueuer) sees the contended counter on and busy loops waiting on it to decrement.
However this assumption is wrong. There can be only one CPU to find the lock contended because call_rcu() (the bypass enqueuer) is the only bypass lock acquire site that may not already hold the NOCB lock beforehand, all the other sites must first contend on the NOCB lock. Therefore step 2) is impossible.
The other problem is that the mitigation assumes that contenders all belong to the same rdp CPU, which is also impossible for a raw spinlock. In theory the warning could trigger if the enqueuer holds the bypass lock and another CPU flushes the bypass queue concurrently but this is prevented from all flush users:
1) NOCB kthreads only flush if they successfully _tried_ to lock the bypass lock. So no contention management here.
2) Flush on callbacks migration happen remotely when the CPU is offline. No concurrency against bypass enqueue.
3) Flush on deoffloading happen either locally with IRQs disabled or remotely when the CPU is not yet online. No concurrency against bypass enqueue.
4) Flush on barrier entrain happen either locally with IRQs disabled or remotely when the CPU is offline. No concurrency against bypass enqueue.
For those reasons, the bypass lock contention mitigation isn't needed and is even wrong. Remove it but keep the warning reporting a contended bypass lock on a remote CPU, to keep unexpected contention awareness.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
show more ...
|
#
483d5bf2 |
| 25-Apr-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Use kthread parking instead of ad-hoc implementation
Upon NOCB deoffloading, the rcuo kthread must be forced to sleep until the corresponding rdp is ever offloaded again. The deoffloader c
rcu/nocb: Use kthread parking instead of ad-hoc implementation
Upon NOCB deoffloading, the rcuo kthread must be forced to sleep until the corresponding rdp is ever offloaded again. The deoffloader clears the SEGCBLIST_OFFLOADED flag, wakes up the rcuo kthread which then notices that change and clears in turn its SEGCBLIST_KTHREAD_CB flag before going to sleep, until it ever sees the SEGCBLIST_OFFLOADED flag again, should a re-offloading happen.
Upon NOCB offloading, the rcuo kthread must be forced to wake up and handle callbacks until the corresponding rdp is ever deoffloaded again. The offloader sets the SEGCBLIST_OFFLOADED flag, wakes up the rcuo kthread which then notices that change and sets in turn its SEGCBLIST_KTHREAD_CB flag before going to check callbacks, until it ever sees the SEGCBLIST_OFFLOADED flag cleared again, should a de-offloading happen again.
This is all a crude ad-hoc and error-prone kthread (un-)parking re-implementation.
Consolidate the behaviour with the appropriate API instead.
[ paulmck: Apply Qiang Zhang feedback provided in Link: below. ] Link: https://lore.kernel.org/all/20240509074046.15629-1-qiang.zhang1211@gmail.com/
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
show more ...
|
#
499d7e7e |
| 15-Nov-2023 |
Frederic Weisbecker <frederic@kernel.org> |
rcu: Rename jiffies_till_flush to jiffies_lazy_flush
The variable name jiffies_till_flush is too generic and therefore:
* It may shadow a global variable * It doesn't tell on what it operates
Make
rcu: Rename jiffies_till_flush to jiffies_lazy_flush
The variable name jiffies_till_flush is too generic and therefore:
* It may shadow a global variable * It doesn't tell on what it operates
Make the name more precise, along with the related APIs.
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
show more ...
|
#
f3c4c007 |
| 17-Jan-2024 |
Zqiang <qiang.zhang1211@gmail.com> |
rcu/nocb: Check rdp_gp->nocb_timer in __call_rcu_nocb_wake()
Currently, only rdp_gp->nocb_timer is used, for nocb_timer of no-rdp_gp structure, the timer_pending() is always return false, this commi
rcu/nocb: Check rdp_gp->nocb_timer in __call_rcu_nocb_wake()
Currently, only rdp_gp->nocb_timer is used, for nocb_timer of no-rdp_gp structure, the timer_pending() is always return false, this commit therefore need to check rdp_gp->nocb_timer in __call_rcu_nocb_wake().
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
show more ...
|
#
dda98810 |
| 10-Jan-2024 |
Zqiang <qiang.zhang1211@gmail.com> |
rcu/nocb: Fix WARN_ON_ONCE() in the rcu_nocb_bypass_lock()
For the kernels built with CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y and CONFIG_RCU_LAZY=y, the following scenarios will trigger WARN_ON_ONCE() in
rcu/nocb: Fix WARN_ON_ONCE() in the rcu_nocb_bypass_lock()
For the kernels built with CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y and CONFIG_RCU_LAZY=y, the following scenarios will trigger WARN_ON_ONCE() in the rcu_nocb_bypass_lock() and rcu_nocb_wait_contended() functions:
CPU2 CPU11 kthread rcu_nocb_cb_kthread ksys_write rcu_do_batch vfs_write rcu_torture_timer_cb proc_sys_write __kmem_cache_free proc_sys_call_handler kmemleak_free drop_caches_sysctl_handler delete_object_full drop_slab __delete_object shrink_slab put_object lazy_rcu_shrink_scan call_rcu rcu_nocb_flush_bypass __call_rcu_commn rcu_nocb_bypass_lock raw_spin_trylock(&rdp->nocb_bypass_lock) fail atomic_inc(&rdp->nocb_lock_contended); rcu_nocb_wait_contended WARN_ON_ONCE(smp_processor_id() != rdp->cpu); WARN_ON_ONCE(atomic_read(&rdp->nocb_lock_contended)) | |_ _ _ _ _ _ _ _ _ _same rdp and rdp->cpu != 11_ _ _ _ _ _ _ _ _ __|
Reproduce this bug with "echo 3 > /proc/sys/vm/drop_caches".
This commit therefore uses rcu_nocb_try_flush_bypass() instead of rcu_nocb_flush_bypass() in lazy_rcu_shrink_scan(). If the nocb_bypass queue is being flushed, then rcu_nocb_try_flush_bypass will return directly.
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
show more ...
|
#
afd4e696 |
| 09-Jan-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Re-arrange call_rcu() NOCB specific code
Currently the call_rcu() function interleaves NOCB and !NOCB enqueue code in a complicated way such that:
* The bypass enqueue code may or may not
rcu/nocb: Re-arrange call_rcu() NOCB specific code
Currently the call_rcu() function interleaves NOCB and !NOCB enqueue code in a complicated way such that:
* The bypass enqueue code may or may not have enqueued and may or may not have locked the ->nocb_lock. Everything that follows is in a Schrödinger locking state for the unwary reviewer's eyes.
* The was_alldone is always set but only used in NOCB related code.
* The NOCB wake up is distantly related to the locking hopefully performed by the bypass enqueue code that did not enqueue on the bypass list.
Unconfuse the whole and gather NOCB and !NOCB specific enqueue code to their own functions.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
show more ...
|
#
b913c3fe |
| 09-Jan-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Make IRQs disablement symmetric
Currently IRQs are disabled on call_rcu() and then depending on the context:
* If the CPU is in nocb mode:
- If the callback is enqueued in the bypass
rcu/nocb: Make IRQs disablement symmetric
Currently IRQs are disabled on call_rcu() and then depending on the context:
* If the CPU is in nocb mode:
- If the callback is enqueued in the bypass list, IRQs are re-enabled implictly by rcu_nocb_try_bypass()
- If the callback is enqueued in the normal list, IRQs are re-enabled implicitly by __call_rcu_nocb_wake()
* If the CPU is NOT in nocb mode, IRQs are reenabled explicitly from call_rcu()
This makes the code a bit hard to follow, especially as it interleaves with nocb locking.
To make the IRQ flags coverage clearer and also in order to prepare for moving all the nocb enqueue code to its own function, always re-enable the IRQ flags explicitly from call_rcu().
Reviewed-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
show more ...
|
#
1e8e6951 |
| 15-Nov-2023 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Remove needless full barrier after callback advancing
A full barrier is issued from nocb_gp_wait() upon callbacks advancing to order grace period completion with callbacks execution.
Howe
rcu/nocb: Remove needless full barrier after callback advancing
A full barrier is issued from nocb_gp_wait() upon callbacks advancing to order grace period completion with callbacks execution.
However these two events are already ordered by the smp_mb__after_unlock_lock() barrier within the call to raw_spin_lock_rcu_node() that is necessary for callbacks advancing to happen.
The following litmus test shows the kind of guarantee that this barrier provides:
C smp_mb__after_unlock_lock
{}
// rcu_gp_cleanup() P0(spinlock_t *rnp_lock, int *gpnum) { // Grace period cleanup increase gp sequence number spin_lock(rnp_lock); WRITE_ONCE(*gpnum, 1); spin_unlock(rnp_lock); }
// nocb_gp_wait() P1(spinlock_t *rnp_lock, spinlock_t *nocb_lock, int *gpnum, int *cb_ready) { int r1;
// Call rcu_advance_cbs() from nocb_gp_wait() spin_lock(nocb_lock); spin_lock(rnp_lock); smp_mb__after_unlock_lock(); r1 = READ_ONCE(*gpnum); WRITE_ONCE(*cb_ready, 1); spin_unlock(rnp_lock); spin_unlock(nocb_lock); }
// nocb_cb_wait() P2(spinlock_t *nocb_lock, int *cb_ready, int *cb_executed) { int r2;
// rcu_do_batch() -> rcu_segcblist_extract_done_cbs() spin_lock(nocb_lock); r2 = READ_ONCE(*cb_ready); spin_unlock(nocb_lock);
// Actual callback execution WRITE_ONCE(*cb_executed, 1); }
P3(int *cb_executed, int *gpnum) { int r3;
WRITE_ONCE(*cb_executed, 2); smp_mb(); r3 = READ_ONCE(*gpnum); }
exists (1:r1=1 /\ 2:r2=1 /\ cb_executed=2 /\ 3:r3=0) (* Bad outcome. *)
Here the bad outcome only occurs if the smp_mb__after_unlock_lock() is removed. This barrier orders the grace period completion against callbacks advancing and even later callbacks invocation, thanks to the opportunistic propagation via the ->nocb_lock to nocb_cb_wait().
Therefore the smp_mb() placed after callbacks advancing can be safely removed.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
show more ...
|
#
ca16265a |
| 15-Nov-2023 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Remove needless LOAD-ACQUIRE
The LOAD-ACQUIRE access performed on rdp->nocb_cb_sleep advertizes ordering callback execution against grace period completion. However this is contradicted by
rcu/nocb: Remove needless LOAD-ACQUIRE
The LOAD-ACQUIRE access performed on rdp->nocb_cb_sleep advertizes ordering callback execution against grace period completion. However this is contradicted by the following:
* This LOAD-ACQUIRE doesn't pair with anything. The only counterpart barrier that can be found is the smp_mb() placed after callbacks advancing in nocb_gp_wait(). However the barrier is placed _after_ ->nocb_cb_sleep write.
* Callbacks can be concurrently advanced between the LOAD-ACQUIRE on ->nocb_cb_sleep and the call to rcu_segcblist_extract_done_cbs() in rcu_do_batch(), making any ordering based on ->nocb_cb_sleep broken.
* Both rcu_segcblist_extract_done_cbs() and rcu_advance_cbs() are called under the nocb_lock, the latter hereby providing already the desired ACQUIRE semantics.
Therefore it is safe to access ->nocb_cb_sleep with a simple compiler barrier.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
show more ...
|
#
2fbacff0 |
| 11-Sep-2023 |
Qi Zheng <zhengqi.arch@bytedance.com> |
rcu: dynamically allocate the rcu-lazy shrinker
Use new APIs to dynamically allocate the rcu-lazy shrinker.
Link: https://lkml.kernel.org/r/20230911094444.68966-16-zhengqi.arch@bytedance.com Signed
rcu: dynamically allocate the rcu-lazy shrinker
Use new APIs to dynamically allocate the rcu-lazy shrinker.
Link: https://lkml.kernel.org/r/20230911094444.68966-16-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Chuck Lever <cel@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: Dai Ngo <Dai.Ngo@oracle.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Sterba <dsterba@suse.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Rui <ray.huang@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nadav Amit <namit@vmware.com> Cc: Neil Brown <neilb@suse.de> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Clark <robdclark@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sean Paul <sean@poorly.run> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Song Liu <song@kernel.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Tom Talpey <tom@talpey.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Yue Hu <huyue2@coolpad.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
3292ba02 |
| 05-Jul-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Make the rcu_nocb_poll boot parameter usable via boot config
The rcu_nocb_poll kernel boot parameter is defined via early_param(), whose parsing functions are invoked from parse_early_param() w
rcu: Make the rcu_nocb_poll boot parameter usable via boot config
The rcu_nocb_poll kernel boot parameter is defined via early_param(), whose parsing functions are invoked from parse_early_param() which is in turn invoked by setup_arch(), which is very early indeed. It is invoked so early that the console output timestamps read 0.000000, in other words, before time begins.
This use of early_param() means that the rcu_nocb_poll kernel boot parameter cannot usefully be embedded into the kernel image. Yes, you can embed it, but setup_boot_config() is invoked from start_kernel() too late for it to be parsed.
But it makes no sense to parse this parameter so early. After all, it cannot do anything until the rcuog kthreads are created, which is long after rcu_init() time, let alone setup_boot_config() time.
This commit therefore switches the rcu_nocb_poll kernel boot parameter from early_param() to __setup(), which allows boot-config parsing of this parameter, in turn allowing it to be embedded into the kernel image.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
show more ...
|
#
fbde57d2 |
| 29-Mar-2023 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Make shrinker iterate only over NOCB CPUs
Callbacks can only be queued as lazy on NOCB CPUs, therefore iterating over the NOCB mask is enough for both counting and scanning. Just lock the
rcu/nocb: Make shrinker iterate only over NOCB CPUs
Callbacks can only be queued as lazy on NOCB CPUs, therefore iterating over the NOCB mask is enough for both counting and scanning. Just lock the mostly uncontended barrier mutex on counting as well in order to keep rcu_nocb_mask stable.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
show more ...
|
#
b96a8b0b |
| 29-Mar-2023 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/nocb: Recheck lazy callbacks under the ->nocb_lock from shrinker
The ->lazy_len is only checked locklessly. Recheck again under the ->nocb_lock to avoid spending more time on flushing/waking if
rcu/nocb: Recheck lazy callbacks under the ->nocb_lock from shrinker
The ->lazy_len is only checked locklessly. Recheck again under the ->nocb_lock to avoid spending more time on flushing/waking if not necessary. The ->lazy_len can still increment concurrently (from 1 to infinity) but under the ->nocb_lock we at least know for sure if there are lazy callbacks at all (->lazy_len > 0).
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
show more ...
|