#
1ecd9d68 |
| 23-Aug-2024 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Defer printing stall-warning backtrace when holding rcu_node lock
The rcu_dump_cpu_stacks() holds the leaf rcu_node structure's ->lock when dumping the stakcks of any CPUs stalling the current
rcu: Defer printing stall-warning backtrace when holding rcu_node lock
The rcu_dump_cpu_stacks() holds the leaf rcu_node structure's ->lock when dumping the stakcks of any CPUs stalling the current grace period. This lock is held to prevent confusion that would otherwise occur when the stalled CPU reported its quiescent state (and then went on to do unrelated things) just as the backtrace NMI was heading towards it.
This has worked well, but on larger systems has recently been observed to cause severe lock contention resulting in CSD-lock stalls and other general unhappiness.
This commit therefore does printk_deferred_enter() before acquiring the lock and printk_deferred_exit() after releasing it, thus deferring the overhead of actually outputting the stack trace out of that lock's critical section.
Reported-by: Rik van Riel <riel@surriel.com> Suggested-by: Rik van Riel <riel@surriel.com> Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
8c03273a |
| 20-Aug-2024 |
John Ogness <john.ogness@linutronix.de> |
rcu: Mark emergency sections in rcu stalls
Mark emergency sections wherever multiple lines of rcu stall information are generated. In an emergency section, every printk() call will attempt to direct
rcu: Mark emergency sections in rcu stalls
Mark emergency sections wherever multiple lines of rcu stall information are generated. In an emergency section, every printk() call will attempt to directly flush to the consoles using the EMERGENCY priority.
Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20240820063001.36405-35-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
show more ...
|
#
9629936d |
| 16-Apr-2024 |
Valentin Schneider <vschneid@redhat.com> |
rcu: Rename rcu_dynticks_in_eqs() into rcu_watching_snap_in_eqs()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, reflect that change in the related helpers.
W
rcu: Rename rcu_dynticks_in_eqs() into rcu_watching_snap_in_eqs()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, reflect that change in the related helpers.
While at it, update a comment that still refers to rcu_dynticks_snap(), which was removed by commit:
7be2e6323b9b ("rcu: Remove full memory barrier on RCU stall printout")
Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
1dd01c06 |
| 02-Jul-2024 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Summarize RCU CPU stall warnings during CSD-lock stalls
During CSD-lock stalls, the additional information output by RCU CPU stall warnings is usually redundant, flooding the console for not go
rcu: Summarize RCU CPU stall warnings during CSD-lock stalls
During CSD-lock stalls, the additional information output by RCU CPU stall warnings is usually redundant, flooding the console for not good reason. However, this has been the way things work for a few years. This commit therefore adds an rcutree.csd_lock_suppress_rcu_stall kernel boot parameter that causes RCU CPU stall warnings to be abbreviated to a single line when there is at least one CPU that has been stuck waiting for CSD lock for more than five seconds.
To make this abbreviated message happen with decent probability:
tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 8 \ --configs "2*TREE01" --kconfig "CONFIG_CSD_LOCK_WAIT_DEBUG=y" \ --bootargs "csdlock_debug=1 rcutorture.stall_cpu=200 \ rcutorture.stall_cpu_holdoff=120 rcutorture.stall_cpu_irqsoff=1 \ rcutree.csd_lock_suppress_rcu_stall=1 \ rcupdate.rcu_exp_cpu_stall_timeout=5000" --trust-make
[ paulmck: Apply kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
2ef2890b |
| 16-Apr-2024 |
Valentin Schneider <vschneid@redhat.com> |
context_tracking, rcu: Rename ct_dynticks_nmi_nesting_cpu() into ct_nmi_nesting_cpu()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, and the 'dynticks' prefix
context_tracking, rcu: Rename ct_dynticks_nmi_nesting_cpu() into ct_nmi_nesting_cpu()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, and the 'dynticks' prefix can be dropped without losing any meaning.
Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
bca9455d |
| 16-Apr-2024 |
Valentin Schneider <vschneid@redhat.com> |
context_tracking, rcu: Rename ct_dynticks_nesting_cpu() into ct_nesting_cpu()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, and the 'dynticks' prefix can be d
context_tracking, rcu: Rename ct_dynticks_nesting_cpu() into ct_nesting_cpu()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, and the 'dynticks' prefix can be dropped without losing any meaning.
Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
a9fde9d1 |
| 16-Apr-2024 |
Valentin Schneider <vschneid@redhat.com> |
context_tracking, rcu: Rename ct_dynticks_cpu() into ct_rcu_watching_cpu()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, reflect that change in the related he
context_tracking, rcu: Rename ct_dynticks_cpu() into ct_rcu_watching_cpu()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, reflect that change in the related helpers.
Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
55911a9f |
| 15-May-2024 |
Frederic Weisbecker <frederic@kernel.org> |
rcu: Remove full memory barrier on RCU stall printout
RCU stall printout fetches the EQS state of a CPU with a preceding full memory barrier. However there is nothing to order this read against at t
rcu: Remove full memory barrier on RCU stall printout
RCU stall printout fetches the EQS state of a CPU with a preceding full memory barrier. However there is nothing to order this read against at this debugging stage. It is inherently racy when performed remotely.
Do a plain read instead.
This was the last user of rcu_dynticks_snap().
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
show more ...
|
#
3758f7d9 |
| 01-Apr-2024 |
Nikita Kiryushin <kiryushin@ancud.ru> |
rcu: Fix buffer overflow in print_cpu_stall_info()
The rcuc-starvation output from print_cpu_stall_info() might overflow the buffer if there is a huge difference in jiffies difference. The situatio
rcu: Fix buffer overflow in print_cpu_stall_info()
The rcuc-starvation output from print_cpu_stall_info() might overflow the buffer if there is a huge difference in jiffies difference. The situation might seem improbable, but computers sometimes get very confused about time, which can result in full-sized integers, and, in this case, buffer overflow.
Also, the unsigned jiffies difference is printed using %ld, which is normally for signed integers. This is intentional for debugging purposes, but it is not obvious from the code.
This commit therefore changes sprintf() to snprintf() and adds a clarifying comment about intention of %ld format.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 245a62982502 ("rcu: Dump rcuc kthread status for CPUs not reporting quiescent state") Signed-off-by: Nikita Kiryushin <kiryushin@ancud.ru> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
show more ...
|
#
09e077cf |
| 08-Mar-2024 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Mark loads from rcu_state.n_online_cpus
The rcu_state.n_online_cpus value is only ever updated by CPU-hotplug operations, which are serialized. However, this value is read locklessly. This com
rcu: Mark loads from rcu_state.n_online_cpus
The rcu_state.n_online_cpus value is only ever updated by CPU-hotplug operations, which are serialized. However, this value is read locklessly. This commit therefore marks those reads. While in the area, it also adds ASSERT_EXCLUSIVE_WRITER() calls just in case parallel CPU hotplug becomes a thing.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
show more ...
|
#
c90b9e49 |
| 07-Mar-2024 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Bring diagnostic read of rcu_state.gp_flags into alignment
This commit adds READ_ONCE() to a lockless diagnostic read from rcu_state.gp_flags to avoid giving the compiler any chance whatsoever
rcu: Bring diagnostic read of rcu_state.gp_flags into alignment
This commit adds READ_ONCE() to a lockless diagnostic read from rcu_state.gp_flags to avoid giving the compiler any chance whatsoever of confusing the diagnostic state printed.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
show more ...
|
#
4e58aaee |
| 02-Nov-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Restrict access to RCU CPU stall notifiers
Although the RCU CPU stall notifiers can be useful for dumping state when tracking down delicate forward-progress bugs where NUMA effects cause cache
rcu: Restrict access to RCU CPU stall notifiers
Although the RCU CPU stall notifiers can be useful for dumping state when tracking down delicate forward-progress bugs where NUMA effects cause cache lines to be delivered to a given CPU regularly, but always in a state that prevents that CPU from making forward progress. These bugs can be detected by the RCU CPU stall-warning mechanism, but in some cases, the stall-warnings printk()s disrupt the forward-progress bug before any useful state can be obtained.
Unfortunately, the notifier mechanism added by commit 5b404fdabacf ("rcu: Add RCU CPU stall notifier") can make matters worse if used at all carelessly. For example, if the stall warning was caused by a lock not being released, then any attempt to acquire that lock in the notifier will hang. This will prevent not only the notifier from producing any useful output, but it will also prevent the stall-warning message from ever appearing.
This commit therefore hides this new RCU CPU stall notifier mechanism under a new RCU_CPU_STALL_NOTIFIER Kconfig option that depends on both DEBUG_KERNEL and RCU_EXPERT. In addition, the rcupdate.rcu_cpu_stall_notifiers=1 kernel boot parameter must also be specified. The RCU_CPU_STALL_NOTIFIER Kconfig option's help text contains a warning and explains the dangers of careless use, recommending lockless notifier code. In addition, a WARN() is triggered each time that an attempt is made to register a stall-warning notifier in kernels built with CONFIG_RCU_CPU_STALL_NOTIFIER=y.
This combination of measures will keep use of this mechanism confined to debug kernels and away from routine deployments.
[ paulmck: Apply Dan Carpenter feedback. ]
Fixes: 5b404fdabacf ("rcu: Add RCU CPU stall notifier") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
show more ...
|
#
b96e7a5f |
| 05-Sep-2023 |
Joel Fernandes (Google) <joel@joelfernandes.org> |
rcu/tree: Defer setting of jiffies during stall reset
There are instances where rcu_cpu_stall_reset() is called when jiffies did not get a chance to update for a long time. Before jiffies is updated
rcu/tree: Defer setting of jiffies during stall reset
There are instances where rcu_cpu_stall_reset() is called when jiffies did not get a chance to update for a long time. Before jiffies is updated, the CPU stall detector can go off triggering false-positives where a just-started grace period appears to be ages old. In the past, we disabled stall detection in rcu_cpu_stall_reset() however this got changed [1]. This is resulting in false-positives in KGDB usecase [2].
Fix this by deferring the update of jiffies to the third run of the FQS loop. This is more robust, as, even if rcu_cpu_stall_reset() is called just before jiffies is read, we would end up pushing out the jiffies read by 3 more FQS loops. Meanwhile the CPU stall detection will be delayed and we will not get any false positives.
[1] https://lore.kernel.org/all/20210521155624.174524-2-senozhatsky@chromium.org/ [2] https://lore.kernel.org/all/20230814020045.51950-2-chenhuacai@loongson.cn/
Tested with rcutorture.cpu_stall option as well to verify stall behavior with/without patch.
Tested-by: Huacai Chen <chenhuacai@loongson.cn> Reported-by: Binbin Zhou <zhoubinbin@loongson.cn> Closes: https://lore.kernel.org/all/20230814020045.51950-2-chenhuacai@loongson.cn/ Suggested-by: Paul McKenney <paulmck@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: a80be428fbc1 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
show more ...
|
#
5b404fda |
| 15-Aug-2023 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Add RCU CPU stall notifier
It is sometimes helpful to have a way for the subsystem causing the stall to dump its state when an RCU CPU stall occurs. This commit therefore bases rcu_stall_chain
rcu: Add RCU CPU stall notifier
It is sometimes helpful to have a way for the subsystem causing the stall to dump its state when an RCU CPU stall occurs. This commit therefore bases rcu_stall_chain_notifier_register() and rcu_stall_chain_notifier_unregister() on atomic notifiers in order to provide this functionality.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
show more ...
|
#
243d5ab3 |
| 24-Jul-2023 |
Zhen Lei <thunder.leizhen@huawei.com> |
rcu: Eliminate check_cpu_stall() duplicate code
The code and comments of self-detected and other-detected RCU CPU stall warnings are identical except the output function. This commit therefore refa
rcu: Eliminate check_cpu_stall() duplicate code
The code and comments of self-detected and other-detected RCU CPU stall warnings are identical except the output function. This commit therefore refactors so as to consolidate the duplicate code.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
show more ...
|
#
f3efe02f |
| 12-Jul-2023 |
Zhen Lei <thunder.leizhen@huawei.com> |
rcu: Don't redump the stalled CPU where RCU GP kthread last ran
The stacks of all stalled CPUs will be dumped in rcu_dump_cpu_stacks(). If the CPU on where RCU GP kthread last ran is stalled, its st
rcu: Don't redump the stalled CPU where RCU GP kthread last ran
The stacks of all stalled CPUs will be dumped in rcu_dump_cpu_stacks(). If the CPU on where RCU GP kthread last ran is stalled, its stack does not need to be dumped again. We can search the corresponding backtrace based on the printed CPU ID.
For example: [ 87.328275] rcu: rcu_sched kthread starved for ... ->cpu=3 <--------| ... ... | [ 89.385007] NMI backtrace for cpu 3 <--------| [ 89.385179] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.10.0+ #22 <--| [ 89.385188] Hardware name: linux,dummy-virt (DT) [ 89.385196] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) [ 89.385204] pc : arch_cpu_idle+0x40/0xc0 [ 89.385211] lr : arch_cpu_idle+0x2c/0xc0 ... ... [ 89.385566] Call trace: [ 89.385574] arch_cpu_idle+0x40/0xc0 [ 89.385581] default_idle_call+0x100/0x450 [ 89.385589] cpuidle_idle_call+0x2f8/0x460 [ 89.385596] do_idle+0x1dc/0x3d0 [ 89.385604] cpu_startup_entry+0x5c/0xb0 [ 89.385613] secondary_start_kernel+0x35c/0x520
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
show more ...
|
#
b934b7ff |
| 12-Jul-2023 |
Zhen Lei <thunder.leizhen@huawei.com> |
rcu: Delete a redundant check in rcu_check_gp_kthread_starvation()
The rcu_check_gp_kthread_starvation() function uses task_cpu() to sample the last CPU that the grace-period kthread ran on, and tas
rcu: Delete a redundant check in rcu_check_gp_kthread_starvation()
The rcu_check_gp_kthread_starvation() function uses task_cpu() to sample the last CPU that the grace-period kthread ran on, and task_cpu() samples the thread_info structure's ->cpu field. But this field will always contain a number corresponding to a CPU that was online some time in the past, thus never a negative number. This invariant is checked by a WARN_ON_ONCE() in set_task_cpu().
This means that if the grace-period kthread exists, that is, if the "gpk" local variable is non-NULL, the "cpu" local variable will be non-negative. This in turn means that the existing check for non-negative "cpu" is redundant with the enclosing check for non-NULL "gpk".
This commit threefore removes the redundant check of "cpu".
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
show more ...
|
#
bcb48185 |
| 12-Jul-2023 |
Jiri Slaby <jirislaby@kernel.org> |
tty: sysrq: switch sysrq handlers from int to u8
The passed parameter to sysrq handlers is a key (a character). So change the type from 'int' to 'u8'. Let it specifically be 'u8' for two reasons: *
tty: sysrq: switch sysrq handlers from int to u8
The passed parameter to sysrq handlers is a key (a character). So change the type from 'int' to 'u8'. Let it specifically be 'u8' for two reasons: * unsigned: unsigned values come from the upper layers (devices) and the tty layer assumes unsigned on most places, and * 8-bit: as that what's supposed to be one day in all the layers built on the top of tty. (Currently, we use mostly 'unsigned char' and somewhere still only 'char'. (But that also translates to the former thanks to -funsigned-char.))
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: Douglas Anderson <dianders@chromium.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Zqiang <qiang.zhang1211@gmail.com> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> # DRM Acked-by: WANG Xuerui <git@xen0n.name> # loongarch Acked-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20230712081811.29004-3-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
84ec7c20 |
| 06-Dec-2022 |
Paul E. McKenney <paulmck@kernel.org> |
rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts
The maximum value of RCU CPU stall-warning timeouts has historically been five minutes (300 seconds). However, the recently in
rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts
The maximum value of RCU CPU stall-warning timeouts has historically been five minutes (300 seconds). However, the recently introduced expedited RCU CPU stall-warning timeout is instead limited to 21 seconds. This causes problems for CI/fuzzing services such as syzkaller by obscuring the issue in question with expedited RCU CPU stall-warning timeout splats.
This commit therefore sets the RCU_EXP_CPU_STALL_TIMEOUT Kconfig options upper bound to 300000 milliseconds, which is 300 seconds (AKA 5 minutes).
[ paulmck: Apply feedback from Hillf Danton. ] [ paulmck: Apply feedback from Geert Uytterhoeven. ]
Reported-by: Dave Chinner <david@fromorbit.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
show more ...
|
#
3ab955de |
| 19-Nov-2022 |
Zhen Lei <thunder.leizhen@huawei.com> |
rcu: Align the output of RCU CPU stall warning messages
Time stamps are added to the output in kernels built with CONFIG_PRINTK_TIME=y, which causes misaligned output. Therefore, replace pr_cont()
rcu: Align the output of RCU CPU stall warning messages
Time stamps are added to the output in kernels built with CONFIG_PRINTK_TIME=y, which causes misaligned output. Therefore, replace pr_cont() with pr_err(), which fixes alignment and gets rid of a couple of despised pr_cont() calls.
Before: [ 37.567343] rcu: INFO: rcu_preempt self-detected stall on CPU [ 37.567839] rcu: 0-....: (1500 ticks this GP) idle=*** [ 37.568270] (t=1501 jiffies g=4717 q=28 ncpus=4) [ 37.568668] CPU: 0 PID: 313 Comm: test0 Not tainted 6.1.0-rc4 #8
After: [ 36.762074] rcu: INFO: rcu_preempt self-detected stall on CPU [ 36.762543] rcu: 0-....: (1499 ticks this GP) idle=*** [ 36.763003] rcu: (t=1500 jiffies g=5097 q=27 ncpus=4) [ 36.763522] CPU: 0 PID: 313 Comm: test0 Not tainted 6.1.0-rc4 #9
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
show more ...
|
#
be42f00b |
| 19-Nov-2022 |
Zhen Lei <thunder.leizhen@huawei.com> |
rcu: Add RCU stall diagnosis information
Because RCU CPU stall warnings are driven from the scheduling-clock interrupt handler, a workload consisting of a very large number of short-duration hardwar
rcu: Add RCU stall diagnosis information
Because RCU CPU stall warnings are driven from the scheduling-clock interrupt handler, a workload consisting of a very large number of short-duration hardware interrupts can result in misleading stall-warning messages. On systems supporting only a single level of interrupts, that is, where interrupts handlers cannot be interrupted, this can produce misleading diagnostics. The stack traces will show the innocent-bystander interrupted task, not the interrupts that are at the very least exacerbating the stall.
This situation can be improved by displaying the number of interrupts and the CPU time that they have consumed. Diagnosing other types of stalls can be eased by also providing the count of softirqs and the CPU time that they consumed as well as the number of context switches and the task-level CPU time consumed.
Consider the following output given this change:
rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 0-....: (1250 ticks this GP) <omitted> rcu: hardirqs softirqs csw/system rcu: number: 624 45 0 rcu: cputime: 69 1 2425 ==> 2500(ms)
This output shows that the number of hard and soft interrupts is small, there are no context switches, and the system takes up a lot of time. This indicates that the current task is looping with preemption disabled.
The impact on system performance is negligible because snapshot is recorded only once for all continuous RCU stalls.
This added debugging information is suppressed by default and can be enabled by building the kernel with CONFIG_RCU_CPU_STALL_CPUTIME=y or by booting with rcupdate.rcu_cpu_stall_cputime=1.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
show more ...
|
#
e73dfe30 |
| 04-Aug-2022 |
Zhen Lei <thunder.leizhen@huawei.com> |
sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task()
The trigger_all_cpu_backtrace() function attempts to send an NMI to the target CPU, which usually provides much better stack tra
sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task()
The trigger_all_cpu_backtrace() function attempts to send an NMI to the target CPU, which usually provides much better stack traces than the dump_cpu_task() function's approach of dumping that stack from some other CPU. So much so that most calls to dump_cpu_task() only happen after a call to trigger_all_cpu_backtrace() has failed. And the exception to this rule really should attempt to use trigger_all_cpu_backtrace() first.
Therefore, move the trigger_all_cpu_backtrace() invocation into dump_cpu_task().
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com>
show more ...
|
#
17147677 |
| 08-Jun-2022 |
Frederic Weisbecker <frederic@kernel.org> |
context_tracking: Convert state to atomic_t
Context tracking's state and dynticks counter are going to be merged in a single field so that both updates can happen atomically and at the same time. Pr
context_tracking: Convert state to atomic_t
Context tracking's state and dynticks counter are going to be merged in a single field so that both updates can happen atomically and at the same time. Prepare for that with converting the state into an atomic_t.
[ paulmck: Apply kernel test robot feedback. ]
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
show more ...
|
#
95e04f48 |
| 08-Jun-2022 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/context_tracking: Move dynticks_nmi_nesting to context tracking
The RCU eqs tracking is going to be performed by the context tracking subsystem. The related nesting counters thus need to be move
rcu/context_tracking: Move dynticks_nmi_nesting to context tracking
The RCU eqs tracking is going to be performed by the context tracking subsystem. The related nesting counters thus need to be moved to the context tracking structure.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
show more ...
|
#
904e600e |
| 08-Jun-2022 |
Frederic Weisbecker <frederic@kernel.org> |
rcu/context_tracking: Move dynticks_nesting to context tracking
The RCU eqs tracking is going to be performed by the context tracking subsystem. The related nesting counters thus need to be moved to
rcu/context_tracking: Move dynticks_nesting to context tracking
The RCU eqs tracking is going to be performed by the context tracking subsystem. The related nesting counters thus need to be moved to the context tracking structure.
Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Nicolas Saenz Julienne <nsaenz@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com> Cc: Yu Liao <liaoyu15@huawei.com> Cc: Phil Auld <pauld@redhat.com> Cc: Paul Gortmaker<paul.gortmaker@windriver.com> Cc: Alex Belits <abelits@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
show more ...
|