Documentation/RCU/stallwarn.rst

f2286ab9SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab==============================
f2286ab9SMauro Carvalho ChehabUsing RCU's CPU Stall Detector
f2286ab9SMauro Carvalho Chehab==============================
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabThis document first discusses what sorts of issues RCU's CPU stall
f2286ab9SMauro Carvalho Chehabdetector can locate, and then discusses kernel parameters and Kconfig
f2286ab9SMauro Carvalho Chehaboptions that can be used to fine-tune the detector's operation.  Finally,
f2286ab9SMauro Carvalho Chehabthis document explains the stall detector's "splat" format.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabWhat Causes RCU CPU Stall Warnings?
f2286ab9SMauro Carvalho Chehab===================================
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabSo your kernel printed an RCU CPU stall warning.  The next question is
f2286ab9SMauro Carvalho Chehab"What caused it?"  The following problems can result in RCU CPU stall
f2286ab9SMauro Carvalho Chehabwarnings:
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab-	A CPU looping in an RCU read-side critical section.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab-	A CPU looping with interrupts disabled.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab-	A CPU looping with preemption disabled.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab-	A CPU looping with bottom halves disabled.
f2286ab9SMauro Carvalho Chehab
3abf176dSPaul E. McKenney-	For !CONFIG_PREEMPTION kernels, a CPU looping anywhere in the
3abf176dSPaul E. McKenney	kernel without potentially invoking schedule().  If the looping
3abf176dSPaul E. McKenney	in the kernel is really expected and desirable behavior, you
3abf176dSPaul E. McKenney	might need to add some calls to cond_resched().
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab-	Booting Linux using a console connection that is too slow to
f2286ab9SMauro Carvalho Chehab	keep up with the boot-time console-message rate.  For example,
e3879ecdSAkira Yokosawa	a 115Kbaud serial console can be *way* too slow to keep up
f2286ab9SMauro Carvalho Chehab	with boot-time message rates, and will frequently result in
f2286ab9SMauro Carvalho Chehab	RCU CPU stall warning messages.  Especially if you have added
f2286ab9SMauro Carvalho Chehab	debug printk()s.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab-	Anything that prevents RCU's grace-period kthreads from running.
f2286ab9SMauro Carvalho Chehab	This can result in the "All QSes seen" console-log message.
f2286ab9SMauro Carvalho Chehab	This message will include information on when the kthread last
f2286ab9SMauro Carvalho Chehab	ran and how often it should be expected to run.  It can also
f2286ab9SMauro Carvalho Chehab	result in the ``rcu_.*kthread starved for`` console-log message,
f2286ab9SMauro Carvalho Chehab	which will include additional debugging information.
f2286ab9SMauro Carvalho Chehab
81ad58beSSebastian Andrzej Siewior-	A CPU-bound real-time task in a CONFIG_PREEMPTION kernel, which might
f2286ab9SMauro Carvalho Chehab	happen to preempt a low-priority task in the middle of an RCU
f2286ab9SMauro Carvalho Chehab	read-side critical section.   This is especially damaging if
f2286ab9SMauro Carvalho Chehab	that low-priority task is not permitted to run on any other CPU,
f2286ab9SMauro Carvalho Chehab	in which case the next RCU grace period can never complete, which
f2286ab9SMauro Carvalho Chehab	will eventually cause the system to run out of memory and hang.
f2286ab9SMauro Carvalho Chehab	While the system is in the process of running itself out of
f2286ab9SMauro Carvalho Chehab	memory, you might see stall-warning messages.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab-	A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
f2286ab9SMauro Carvalho Chehab	is running at a higher priority than the RCU softirq threads.
f2286ab9SMauro Carvalho Chehab	This will prevent RCU callbacks from ever being invoked,
f2286ab9SMauro Carvalho Chehab	and in a CONFIG_PREEMPT_RCU kernel will further prevent
f2286ab9SMauro Carvalho Chehab	RCU grace periods from ever completing.  Either way, the
f2286ab9SMauro Carvalho Chehab	system will eventually run out of memory and hang.  In the
f2286ab9SMauro Carvalho Chehab	CONFIG_PREEMPT_RCU case, you might see stall-warning
f2286ab9SMauro Carvalho Chehab	messages.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab	You can use the rcutree.kthread_prio kernel boot parameter to
f2286ab9SMauro Carvalho Chehab	increase the scheduling priority of RCU's kthreads, which can
f2286ab9SMauro Carvalho Chehab	help avoid this problem.  However, please note that doing this
f2286ab9SMauro Carvalho Chehab	can increase your system's context-switch rate and thus degrade
f2286ab9SMauro Carvalho Chehab	performance.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab-	A periodic interrupt whose handler takes longer than the time
f2286ab9SMauro Carvalho Chehab	interval between successive pairs of interrupts.  This can
f2286ab9SMauro Carvalho Chehab	prevent RCU's kthreads and softirq handlers from running.
f2286ab9SMauro Carvalho Chehab	Note that certain high-overhead debugging options, for example
f2286ab9SMauro Carvalho Chehab	the function_graph tracer, can result in interrupt handler taking
f2286ab9SMauro Carvalho Chehab	considerably longer than normal, which can in turn result in
f2286ab9SMauro Carvalho Chehab	RCU CPU stall warnings.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab-	Testing a workload on a fast system, tuning the stall-warning
f2286ab9SMauro Carvalho Chehab	timeout down to just barely avoid RCU CPU stall warnings, and then
f2286ab9SMauro Carvalho Chehab	running the same workload with the same stall-warning timeout on a
f2286ab9SMauro Carvalho Chehab	slow system.  Note that thermal throttling and on-demand governors
f2286ab9SMauro Carvalho Chehab	can cause a single system to be sometimes fast and sometimes slow!
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab-	A hardware or software issue shuts off the scheduler-clock
f2286ab9SMauro Carvalho Chehab	interrupt on a CPU that is not in dyntick-idle mode.  This
f2286ab9SMauro Carvalho Chehab	problem really has happened, and seems to be most likely to
f2286ab9SMauro Carvalho Chehab	result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels.
f2286ab9SMauro Carvalho Chehab
b81898e3SPaul E. McKenney-	A hardware or software issue that prevents time-based wakeups
b81898e3SPaul E. McKenney	from occurring.  These issues can range from misconfigured or
b81898e3SPaul E. McKenney	buggy timer hardware through bugs in the interrupt or exception
b81898e3SPaul E. McKenney	path (whether hardware, firmware, or software) through bugs
b81898e3SPaul E. McKenney	in Linux's timer subsystem through bugs in the scheduler, and,
683954e5SNeeraj Upadhyay	yes, even including bugs in RCU itself.  It can also result in
683954e5SNeeraj Upadhyay	the ``rcu_.*timer wakeup didn't happen for`` console-log message,
683954e5SNeeraj Upadhyay	which will include additional debugging information.
b81898e3SPaul E. McKenney
13bc8fa8SPaul E. McKenney-	A low-level kernel issue that either fails to invoke one of the
c33ef43aSFrederic Weisbecker	variants of rcu_eqs_enter(true), rcu_eqs_exit(true), ct_idle_enter(),
6f0e6c15SFrederic Weisbecker	ct_idle_exit(), ct_irq_enter(), or ct_irq_exit() on the one
13bc8fa8SPaul E. McKenney	hand, or that invokes one of them too many times on the other.
13bc8fa8SPaul E. McKenney	Historically, the most frequent issue has been an omission
13bc8fa8SPaul E. McKenney	of either irq_enter() or irq_exit(), which in turn invoke
6f0e6c15SFrederic Weisbecker	ct_irq_enter() or ct_irq_exit(), respectively.  Building your
13bc8fa8SPaul E. McKenney	kernel with CONFIG_RCU_EQS_DEBUG=y can help track down these types
13bc8fa8SPaul E. McKenney	of issues, which sometimes arise in architecture-specific code.
13bc8fa8SPaul E. McKenney
f2286ab9SMauro Carvalho Chehab-	A bug in the RCU implementation.
f2286ab9SMauro Carvalho Chehab
3abf176dSPaul E. McKenney-	A hardware failure.  This is quite unlikely, but is not at all
3abf176dSPaul E. McKenney	uncommon in large datacenter.  In one memorable case some decades
3abf176dSPaul E. McKenney	back, a CPU failed in a running system, becoming unresponsive,
3abf176dSPaul E. McKenney	but not causing an immediate crash.  This resulted in a series
3abf176dSPaul E. McKenney	of RCU CPU stall warnings, eventually leading the realization
3abf176dSPaul E. McKenney	that the CPU had failed.
f2286ab9SMauro Carvalho Chehab
3abf176dSPaul E. McKenneyThe RCU, RCU-sched, RCU-tasks, and RCU-tasks-trace implementations have
3abf176dSPaul E. McKenneyCPU stall warning.  Note that SRCU does *not* have CPU stall warnings.
3abf176dSPaul E. McKenneyPlease note that RCU only detects CPU stalls when there is a grace period
3abf176dSPaul E. McKenneyin progress.  No grace period, no CPU stall warnings.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabTo diagnose the cause of the stall, inspect the stack traces.
f2286ab9SMauro Carvalho ChehabThe offending function will usually be near the top of the stack.
f2286ab9SMauro Carvalho ChehabIf you have a series of stall warnings from a single extended stall,
f2286ab9SMauro Carvalho Chehabcomparing the stack traces can often help determine where the stall
f2286ab9SMauro Carvalho Chehabis occurring, which will usually be in the function nearest the top of
f2286ab9SMauro Carvalho Chehabthat portion of the stack which remains the same from trace to trace.
f2286ab9SMauro Carvalho ChehabIf you can reliably trigger the stall, ftrace can be quite helpful.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabRCU bugs can often be debugged with the help of CONFIG_RCU_TRACE
f2286ab9SMauro Carvalho Chehaband with RCU's event tracing.  For information on RCU's event tracing,
f2286ab9SMauro Carvalho Chehabsee include/trace/events/rcu.h.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabFine-Tuning the RCU CPU Stall Detector
f2286ab9SMauro Carvalho Chehab======================================
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabThe rcuupdate.rcu_cpu_stall_suppress module parameter disables RCU's
f2286ab9SMauro Carvalho ChehabCPU stall detector, which detects conditions that unduly delay RCU grace
f2286ab9SMauro Carvalho Chehabperiods.  This module parameter enables CPU stall detection by default,
f2286ab9SMauro Carvalho Chehabbut may be overridden via boot-time parameter or at runtime via sysfs.
f2286ab9SMauro Carvalho ChehabThe stall detector's idea of what constitutes "unduly delayed" is
f2286ab9SMauro Carvalho Chehabcontrolled by a set of kernel configuration variables and cpp macros:
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabCONFIG_RCU_CPU_STALL_TIMEOUT
f2286ab9SMauro Carvalho Chehab----------------------------
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab	This kernel configuration parameter defines the period of time
f2286ab9SMauro Carvalho Chehab	that RCU will wait from the beginning of a grace period until it
f2286ab9SMauro Carvalho Chehab	issues an RCU CPU stall warning.  This time period is normally
f2286ab9SMauro Carvalho Chehab	21 seconds.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab	This configuration parameter may be changed at runtime via the
f2286ab9SMauro Carvalho Chehab	/sys/module/rcupdate/parameters/rcu_cpu_stall_timeout, however
f2286ab9SMauro Carvalho Chehab	this parameter is checked only at the beginning of a cycle.
f2286ab9SMauro Carvalho Chehab	So if you are 10 seconds into a 40-second stall, setting this
f2286ab9SMauro Carvalho Chehab	sysfs parameter to (say) five will shorten the timeout for the
e3879ecdSAkira Yokosawa	*next* stall, or the following warning for the current stall
f2286ab9SMauro Carvalho Chehab	(assuming the stall lasts long enough).  It will not affect the
f2286ab9SMauro Carvalho Chehab	timing of the next warning for the current stall.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab	Stall-warning messages may be enabled and disabled completely via
f2286ab9SMauro Carvalho Chehab	/sys/module/rcupdate/parameters/rcu_cpu_stall_suppress.
f2286ab9SMauro Carvalho Chehab
28b3ae42SUladzislau RezkiCONFIG_RCU_EXP_CPU_STALL_TIMEOUT
28b3ae42SUladzislau Rezki--------------------------------
28b3ae42SUladzislau Rezki
28b3ae42SUladzislau Rezki	Same as the CONFIG_RCU_CPU_STALL_TIMEOUT parameter but only for
28b3ae42SUladzislau Rezki	the expedited grace period. This parameter defines the period
28b3ae42SUladzislau Rezki	of time that RCU will wait from the beginning of an expedited
28b3ae42SUladzislau Rezki	grace period until it issues an RCU CPU stall warning. This time
28b3ae42SUladzislau Rezki	period is normally 20 milliseconds on Android devices.	A zero
28b3ae42SUladzislau Rezki	value causes the CONFIG_RCU_CPU_STALL_TIMEOUT value to be used,
28b3ae42SUladzislau Rezki	after conversion to milliseconds.
28b3ae42SUladzislau Rezki
28b3ae42SUladzislau Rezki	This configuration parameter may be changed at runtime via the
28b3ae42SUladzislau Rezki	/sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout, however
28b3ae42SUladzislau Rezki	this parameter is checked only at the beginning of a cycle. If you
28b3ae42SUladzislau Rezki	are in a current stall cycle, setting it to a new value will change
28b3ae42SUladzislau Rezki	the timeout for the -next- stall.
28b3ae42SUladzislau Rezki
28b3ae42SUladzislau Rezki	Stall-warning messages may be enabled and disabled completely via
28b3ae42SUladzislau Rezki	/sys/module/rcupdate/parameters/rcu_cpu_stall_suppress.
28b3ae42SUladzislau Rezki
f2286ab9SMauro Carvalho ChehabRCU_STALL_DELAY_DELTA
f2286ab9SMauro Carvalho Chehab---------------------
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab	Although the lockdep facility is extremely useful, it does add
f2286ab9SMauro Carvalho Chehab	some overhead.  Therefore, under CONFIG_PROVE_RCU, the
f2286ab9SMauro Carvalho Chehab	RCU_STALL_DELAY_DELTA macro allows five extra seconds before
f2286ab9SMauro Carvalho Chehab	giving an RCU CPU stall warning message.  (This is a cpp
f2286ab9SMauro Carvalho Chehab	macro, not a kernel configuration parameter.)
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabRCU_STALL_RAT_DELAY
f2286ab9SMauro Carvalho Chehab-------------------
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab	The CPU stall detector tries to make the offending CPU print its
f2286ab9SMauro Carvalho Chehab	own warnings, as this often gives better-quality stack traces.
f2286ab9SMauro Carvalho Chehab	However, if the offending CPU does not detect its own stall in
f2286ab9SMauro Carvalho Chehab	the number of jiffies specified by RCU_STALL_RAT_DELAY, then
f2286ab9SMauro Carvalho Chehab	some other CPU will complain.  This delay is normally set to
f2286ab9SMauro Carvalho Chehab	two jiffies.  (This is a cpp macro, not a kernel configuration
f2286ab9SMauro Carvalho Chehab	parameter.)
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehabrcupdate.rcu_task_stall_timeout
f2286ab9SMauro Carvalho Chehab-------------------------------
f2286ab9SMauro Carvalho Chehab
3abf176dSPaul E. McKenney	This boot/sysfs parameter controls the RCU-tasks and
3abf176dSPaul E. McKenney	RCU-tasks-trace stall warning intervals.  A value of zero or less
3abf176dSPaul E. McKenney	suppresses RCU-tasks stall warnings.  A positive value sets the
3abf176dSPaul E. McKenney	stall-warning interval in seconds.  An RCU-tasks stall warning
3abf176dSPaul E. McKenney	starts with the line:
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab		INFO: rcu_tasks detected stalls on tasks:
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab	And continues with the output of sched_show_task() for each
f2286ab9SMauro Carvalho Chehab	task stalling the current RCU-tasks grace period.
f2286ab9SMauro Carvalho Chehab
3abf176dSPaul E. McKenney	An RCU-tasks-trace stall warning starts (and continues) similarly:
3abf176dSPaul E. McKenney
3abf176dSPaul E. McKenney		INFO: rcu_tasks_trace detected stalls on tasks
3abf176dSPaul E. McKenney
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabInterpreting RCU's CPU Stall-Detector "Splats"
f2286ab9SMauro Carvalho Chehab==============================================
f2286ab9SMauro Carvalho Chehab
99c0974fSPaul E. McKenneyFor non-RCU-tasks flavors of RCU, when a CPU detects that some other
99c0974fSPaul E. McKenneyCPU is stalling, it will print a message similar to the following::
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab	INFO: rcu_sched detected stalls on CPUs/tasks:
f2286ab9SMauro Carvalho Chehab	2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
f2286ab9SMauro Carvalho Chehab	16-...: (0 ticks this GP) idle=81c/0/0 softirq=764/764 fqs=0
f2286ab9SMauro Carvalho Chehab	(detected by 32, t=2603 jiffies, g=7075, q=625)
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabThis message indicates that CPU 32 detected that CPUs 2 and 16 were both
f2286ab9SMauro Carvalho Chehabcausing stalls, and that the stall was affecting RCU-sched.  This message
f2286ab9SMauro Carvalho Chehabwill normally be followed by stack dumps for each CPU.  Please note that
f2286ab9SMauro Carvalho ChehabPREEMPT_RCU builds can be stalled by tasks as well as by CPUs, and that
f2286ab9SMauro Carvalho Chehabthe tasks will be indicated by PID, for example, "P3421".  It is even
e3879ecdSAkira Yokosawapossible for an rcu_state stall to be caused by both CPUs *and* tasks,
f2286ab9SMauro Carvalho Chehabin which case the offending CPUs and tasks will all be called out in the list.
99c0974fSPaul E. McKenneyIn some cases, CPUs will detect themselves stalling, which will result
99c0974fSPaul E. McKenneyin a self-detected stall.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabCPU 2's "(3 GPs behind)" indicates that this CPU has not interacted with
f2286ab9SMauro Carvalho Chehabthe RCU core for the past three grace periods.  In contrast, CPU 16's "(0
f2286ab9SMauro Carvalho Chehabticks this GP)" indicates that this CPU has not taken any scheduling-clock
f2286ab9SMauro Carvalho Chehabinterrupts during the current stalled grace period.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabThe "idle=" portion of the message prints the dyntick-idle state.
f2286ab9SMauro Carvalho ChehabThe hex number before the first "/" is the low-order 12 bits of the
f2286ab9SMauro Carvalho Chehabdynticks counter, which will have an even-numbered value if the CPU
f2286ab9SMauro Carvalho Chehabis in dyntick-idle mode and an odd-numbered value otherwise.  The hex
f2286ab9SMauro Carvalho Chehabnumber between the two "/"s is the value of the nesting, which will be
f2286ab9SMauro Carvalho Chehaba small non-negative number if in the idle loop (as shown above) and a
3abf176dSPaul E. McKenneyvery large positive number otherwise.  The number following the final
3abf176dSPaul E. McKenney"/" is the NMI nesting, which will be a small non-negative number.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabThe "softirq=" portion of the message tracks the number of RCU softirq
f2286ab9SMauro Carvalho Chehabhandlers that the stalled CPU has executed.  The number before the "/"
f2286ab9SMauro Carvalho Chehabis the number that had executed since boot at the time that this CPU
f2286ab9SMauro Carvalho Chehablast noted the beginning of a grace period, which might be the current
f2286ab9SMauro Carvalho Chehab(stalled) grace period, or it might be some earlier grace period (for
f2286ab9SMauro Carvalho Chehabexample, if the CPU might have been in dyntick-idle mode for an extended
9984fd7eSHaocheng Xietime period).  The number after the "/" is the number that have executed
f2286ab9SMauro Carvalho Chehabsince boot until the current time.  If this latter number stays constant
f2286ab9SMauro Carvalho Chehabacross repeated stall-warning messages, it is possible that RCU's softirq
f2286ab9SMauro Carvalho Chehabhandlers are no longer able to execute on this CPU.  This can happen if
f2286ab9SMauro Carvalho Chehabthe stalled CPU is spinning with interrupts are disabled, or, in -rt
f2286ab9SMauro Carvalho Chehabkernels, if a high-priority process is starving RCU's softirq handler.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabThe "fqs=" shows the number of force-quiescent-state idle/offline
f2286ab9SMauro Carvalho Chehabdetection passes that the grace-period kthread has made across this
f2286ab9SMauro Carvalho ChehabCPU since the last time that this CPU noted the beginning of a grace
f2286ab9SMauro Carvalho Chehabperiod.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabThe "detected by" line indicates which CPU detected the stall (in this
f2286ab9SMauro Carvalho Chehabcase, CPU 32), how many jiffies have elapsed since the start of the grace
f2286ab9SMauro Carvalho Chehabperiod (in this case 2603), the grace-period sequence number (7075), and
f2286ab9SMauro Carvalho Chehaban estimate of the total number of RCU callbacks queued across all CPUs
f2286ab9SMauro Carvalho Chehab(625 in this case).
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabIf the grace period ends just as the stall warning starts printing,
f2286ab9SMauro Carvalho Chehabthere will be a spurious stall-warning message, which will include
f2286ab9SMauro Carvalho Chehabthe following::
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab	INFO: Stall ended before state dump start
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabThis is rare, but does happen from time to time in real life.  It is also
f2286ab9SMauro Carvalho Chehabpossible for a zero-jiffy stall to be flagged in this case, depending
f2286ab9SMauro Carvalho Chehabon how the stall warning and the grace-period initialization happen to
f2286ab9SMauro Carvalho Chehabinteract.  Please note that it is not possible to entirely eliminate this
f2286ab9SMauro Carvalho Chehabsort of false positive without resorting to things like stop_machine(),
f2286ab9SMauro Carvalho Chehabwhich is overkill for this sort of problem.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabIf all CPUs and tasks have passed through quiescent states, but the
f2286ab9SMauro Carvalho Chehabgrace period has nevertheless failed to end, the stall-warning splat
f2286ab9SMauro Carvalho Chehabwill include something like the following::
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab	All QSes seen, last rcu_preempt kthread activity 23807 (4297905177-4297881370), jiffies_till_next_fqs=3, root ->qsmask 0x0
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabThe "23807" indicates that it has been more than 23 thousand jiffies
f2286ab9SMauro Carvalho Chehabsince the grace-period kthread ran.  The "jiffies_till_next_fqs"
f2286ab9SMauro Carvalho Chehabindicates how frequently that kthread should run, giving the number
f2286ab9SMauro Carvalho Chehabof jiffies between force-quiescent-state scans, in this case three,
f2286ab9SMauro Carvalho Chehabwhich is way less than 23807.  Finally, the root rcu_node structure's
f2286ab9SMauro Carvalho Chehab->qsmask field is printed, which will normally be zero.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabIf the relevant grace-period kthread has been unable to run prior to
f2286ab9SMauro Carvalho Chehabthe stall warning, as was the case in the "All QSes seen" line above,
f2286ab9SMauro Carvalho Chehabthe following additional line is printed::
f2286ab9SMauro Carvalho Chehab
99c0974fSPaul E. McKenney	rcu_sched kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5
99c0974fSPaul E. McKenney	Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabStarving the grace-period kthreads of CPU time can of course result
f2286ab9SMauro Carvalho Chehabin RCU CPU stall warnings even when all CPUs and tasks have passed
f2286ab9SMauro Carvalho Chehabthrough the required quiescent states.  The "g" number shows the current
f2286ab9SMauro Carvalho Chehabgrace-period sequence number, the "f" precedes the ->gp_flags command
f2286ab9SMauro Carvalho Chehabto the grace-period kthread, the "RCU_GP_WAIT_FQS" indicates that the
f2286ab9SMauro Carvalho Chehabkthread is waiting for a short timeout, the "state" precedes value of the
f2286ab9SMauro Carvalho Chehabtask_struct ->state field, and the "cpu" indicates that the grace-period
f2286ab9SMauro Carvalho Chehabkthread last ran on CPU 5.
f2286ab9SMauro Carvalho Chehab
683954e5SNeeraj UpadhyayIf the relevant grace-period kthread does not wake from FQS wait in a
683954e5SNeeraj Upadhyayreasonable time, then the following additional line is printed::
683954e5SNeeraj Upadhyay
683954e5SNeeraj Upadhyay	kthread timer wakeup didn't happen for 23804 jiffies! g7076 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
683954e5SNeeraj Upadhyay
683954e5SNeeraj UpadhyayThe "23804" indicates that kthread's timer expired more than 23 thousand
683954e5SNeeraj Upadhyayjiffies ago.  The rest of the line has meaning similar to the kthread
683954e5SNeeraj Upadhyaystarvation case.
683954e5SNeeraj Upadhyay
683954e5SNeeraj UpadhyayAdditionally, the following line is printed::
683954e5SNeeraj Upadhyay
683954e5SNeeraj Upadhyay	Possible timer handling issue on cpu=4 timer-softirq=11142
683954e5SNeeraj Upadhyay
683954e5SNeeraj UpadhyayHere "cpu" indicates that the grace-period kthread last ran on CPU 4,
683954e5SNeeraj Upadhyaywhere it queued the fqs timer.  The number following the "timer-softirq"
683954e5SNeeraj Upadhyayis the current ``TIMER_SOFTIRQ`` count on cpu 4.  If this value does not
683954e5SNeeraj Upadhyaychange on successive RCU CPU stall warnings, there is further reason to
683954e5SNeeraj Upadhyaysuspect a timer problem.
683954e5SNeeraj Upadhyay
99c0974fSPaul E. McKenneyThese messages are usually followed by stack dumps of the CPUs and tasks
99c0974fSPaul E. McKenneyinvolved in the stall.  These stack traces can help you locate the cause
99c0974fSPaul E. McKenneyof the stall, keeping in mind that the CPU detecting the stall will have
99c0974fSPaul E. McKenneyan interrupt frame that is mainly devoted to detecting the stall.
99c0974fSPaul E. McKenney
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabMultiple Warnings From One Stall
f2286ab9SMauro Carvalho Chehab================================
f2286ab9SMauro Carvalho Chehab
99c0974fSPaul E. McKenneyIf a stall lasts long enough, multiple stall-warning messages will
99c0974fSPaul E. McKenneybe printed for it.  The second and subsequent messages are printed at
f2286ab9SMauro Carvalho Chehablonger intervals, so that the time between (say) the first and second
f2286ab9SMauro Carvalho Chehabmessage will be about three times the interval between the beginning
99c0974fSPaul E. McKenneyof the stall and the first message.  It can be helpful to compare the
99c0974fSPaul E. McKenneystack dumps for the different messages for the same stalled grace period.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabStall Warnings for Expedited Grace Periods
f2286ab9SMauro Carvalho Chehab==========================================
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabIf an expedited grace period detects a stall, it will place a message
f2286ab9SMauro Carvalho Chehablike the following in dmesg::
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho Chehab	INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 21119 jiffies s: 73 root: 0x2/.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabThis indicates that CPU 7 has failed to respond to a reschedule IPI.
f2286ab9SMauro Carvalho ChehabThe three periods (".") following the CPU number indicate that the CPU
f2286ab9SMauro Carvalho Chehabis online (otherwise the first period would instead have been "O"),
f2286ab9SMauro Carvalho Chehabthat the CPU was online at the beginning of the expedited grace period
f2286ab9SMauro Carvalho Chehab(otherwise the second period would have instead been "o"), and that
f2286ab9SMauro Carvalho Chehabthe CPU has been online at least once since boot (otherwise, the third
f2286ab9SMauro Carvalho Chehabperiod would instead have been "N").  The number before the "jiffies"
f2286ab9SMauro Carvalho Chehabindicates that the expedited grace period has been going on for 21,119
f2286ab9SMauro Carvalho Chehabjiffies.  The number following the "s:" indicates that the expedited
f2286ab9SMauro Carvalho Chehabgrace-period sequence counter is 73.  The fact that this last value is
f2286ab9SMauro Carvalho Chehabodd indicates that an expedited grace period is in flight.  The number
f2286ab9SMauro Carvalho Chehabfollowing "root:" is a bitmask that indicates which children of the root
f2286ab9SMauro Carvalho Chehabrcu_node structure correspond to CPUs and/or tasks that are blocking the
f2286ab9SMauro Carvalho Chehabcurrent expedited grace period.  If the tree had more than one level,
f2286ab9SMauro Carvalho Chehabadditional hex numbers would be printed for the states of the other
f2286ab9SMauro Carvalho Chehabrcu_node structures in the tree.
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabAs with normal grace periods, PREEMPT_RCU builds can be stalled by
f2286ab9SMauro Carvalho Chehabtasks as well as by CPUs, and that the tasks will be indicated by PID,
f2286ab9SMauro Carvalho Chehabfor example, "P3421".
f2286ab9SMauro Carvalho Chehab
f2286ab9SMauro Carvalho ChehabIt is entirely possible to see stall warnings from normal and from
f2286ab9SMauro Carvalho Chehabexpedited grace periods at about the same time during the same run.
7a21ddf0SZhen Lei
7a21ddf0SZhen LeiRCU_CPU_STALL_CPUTIME
7a21ddf0SZhen Lei=====================
7a21ddf0SZhen Lei
7a21ddf0SZhen LeiIn kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y or booted with
7a21ddf0SZhen Leircupdate.rcu_cpu_stall_cputime=1, the following additional information
7a21ddf0SZhen Leiis supplied with each RCU CPU stall warning::
7a21ddf0SZhen Lei
7a21ddf0SZhen Lei  rcu:          hardirqs   softirqs   csw/system
7a21ddf0SZhen Lei  rcu:  number:      624         45            0
7a21ddf0SZhen Lei  rcu: cputime:       69          1         2425   ==> 2500(ms)
7a21ddf0SZhen Lei
7a21ddf0SZhen LeiThese statistics are collected during the sampling period. The values
7a21ddf0SZhen Leiin row "number:" are the number of hard interrupts, number of soft
7a21ddf0SZhen Leiinterrupts, and number of context switches on the stalled CPU. The
7a21ddf0SZhen Leifirst three values in row "cputime:" indicate the CPU time in
7a21ddf0SZhen Leimilliseconds consumed by hard interrupts, soft interrupts, and tasks
7a21ddf0SZhen Leion the stalled CPU.  The last number is the measurement interval, again
7a21ddf0SZhen Leiin milliseconds.  Because user-mode tasks normally do not cause RCU CPU
7a21ddf0SZhen Leistalls, these tasks are typically kernel tasks, which is why only the
7a21ddf0SZhen Leisystem CPU time are considered.
7a21ddf0SZhen Lei
*5e013dc1SZhen LeiThe sampling period is shown as follows::
7a21ddf0SZhen Lei
*5e013dc1SZhen Lei  |<------------first timeout---------->|<-----second timeout----->|
*5e013dc1SZhen Lei  |<--half timeout-->|<--half timeout-->|                          |
*5e013dc1SZhen Lei  |                  |<--first period-->|                          |
*5e013dc1SZhen Lei  |                  |<-----------second sampling period---------->|
*5e013dc1SZhen Lei  |                  |                  |                          |
*5e013dc1SZhen Lei             snapshot time point    1st-stall                  2nd-stall
7a21ddf0SZhen Lei
7a21ddf0SZhen LeiThe following describes four typical scenarios:
7a21ddf0SZhen Lei
*5e013dc1SZhen Lei1. A CPU looping with interrupts disabled.
*5e013dc1SZhen Lei
*5e013dc1SZhen Lei   ::
7a21ddf0SZhen Lei
7a21ddf0SZhen Lei     rcu:          hardirqs   softirqs   csw/system
7a21ddf0SZhen Lei     rcu:  number:        0          0            0
7a21ddf0SZhen Lei     rcu: cputime:        0          0            0   ==> 2500(ms)
7a21ddf0SZhen Lei
7a21ddf0SZhen Lei   Because interrupts have been disabled throughout the measurement
7a21ddf0SZhen Lei   interval, there are no interrupts and no context switches.
7a21ddf0SZhen Lei   Furthermore, because CPU time consumption was measured using interrupt
7a21ddf0SZhen Lei   handlers, the system CPU consumption is misleadingly measured as zero.
7a21ddf0SZhen Lei   This scenario will normally also have "(0 ticks this GP)" printed on
7a21ddf0SZhen Lei   this CPU's summary line.
7a21ddf0SZhen Lei
7a21ddf0SZhen Lei2. A CPU looping with bottom halves disabled.
7a21ddf0SZhen Lei
7a21ddf0SZhen Lei   This is similar to the previous example, but with non-zero number of
7a21ddf0SZhen Lei   and CPU time consumed by hard interrupts, along with non-zero CPU
*5e013dc1SZhen Lei   time consumed by in-kernel execution::
7a21ddf0SZhen Lei
7a21ddf0SZhen Lei     rcu:          hardirqs   softirqs   csw/system
7a21ddf0SZhen Lei     rcu:  number:      624          0            0
7a21ddf0SZhen Lei     rcu: cputime:       49          0         2446   ==> 2500(ms)
7a21ddf0SZhen Lei
7a21ddf0SZhen Lei   The fact that there are zero softirqs gives a hint that these were
7a21ddf0SZhen Lei   disabled, perhaps via local_bh_disable().  It is of course possible
7a21ddf0SZhen Lei   that there were no softirqs, perhaps because all events that would
7a21ddf0SZhen Lei   result in softirq execution are confined to other CPUs.  In this case,
7a21ddf0SZhen Lei   the diagnosis should continue as shown in the next example.
7a21ddf0SZhen Lei
7a21ddf0SZhen Lei3. A CPU looping with preemption disabled.
7a21ddf0SZhen Lei
*5e013dc1SZhen Lei   Here, only the number of context switches is zero::
7a21ddf0SZhen Lei
7a21ddf0SZhen Lei     rcu:          hardirqs   softirqs   csw/system
7a21ddf0SZhen Lei     rcu:  number:      624         45            0
7a21ddf0SZhen Lei     rcu: cputime:       69          1         2425   ==> 2500(ms)
7a21ddf0SZhen Lei
7a21ddf0SZhen Lei   This situation hints that the stalled CPU was looping with preemption
7a21ddf0SZhen Lei   disabled.
7a21ddf0SZhen Lei
*5e013dc1SZhen Lei4. No looping, but massive hard and soft interrupts.
*5e013dc1SZhen Lei
*5e013dc1SZhen Lei   ::
7a21ddf0SZhen Lei
7a21ddf0SZhen Lei     rcu:          hardirqs   softirqs   csw/system
7a21ddf0SZhen Lei     rcu:  number:       xx         xx            0
7a21ddf0SZhen Lei     rcu: cputime:       xx         xx            0   ==> 2500(ms)
7a21ddf0SZhen Lei
7a21ddf0SZhen Lei   Here, the number and CPU time of hard interrupts are all non-zero,
7a21ddf0SZhen Lei   but the number of context switches and the in-kernel CPU time consumed
7a21ddf0SZhen Lei   are zero. The number and cputime of soft interrupts will usually be
7a21ddf0SZhen Lei   non-zero, but could be zero, for example, if the CPU was spinning
7a21ddf0SZhen Lei   within a single hard interrupt handler.
7a21ddf0SZhen Lei
7a21ddf0SZhen Lei   If this type of RCU CPU stall warning can be reproduced, you can
7a21ddf0SZhen Lei   narrow it down by looking at /proc/interrupts or by writing code to
7a21ddf0SZhen Lei   trace each interrupt, for example, by referring to show_interrupts().