xref: /linux/Documentation/RCU/stallwarn.rst (revision 5e013dc1)
1f2286ab9SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0
2f2286ab9SMauro Carvalho Chehab
3f2286ab9SMauro Carvalho Chehab==============================
4f2286ab9SMauro Carvalho ChehabUsing RCU's CPU Stall Detector
5f2286ab9SMauro Carvalho Chehab==============================
6f2286ab9SMauro Carvalho Chehab
7f2286ab9SMauro Carvalho ChehabThis document first discusses what sorts of issues RCU's CPU stall
8f2286ab9SMauro Carvalho Chehabdetector can locate, and then discusses kernel parameters and Kconfig
9f2286ab9SMauro Carvalho Chehaboptions that can be used to fine-tune the detector's operation.  Finally,
10f2286ab9SMauro Carvalho Chehabthis document explains the stall detector's "splat" format.
11f2286ab9SMauro Carvalho Chehab
12f2286ab9SMauro Carvalho Chehab
13f2286ab9SMauro Carvalho ChehabWhat Causes RCU CPU Stall Warnings?
14f2286ab9SMauro Carvalho Chehab===================================
15f2286ab9SMauro Carvalho Chehab
16f2286ab9SMauro Carvalho ChehabSo your kernel printed an RCU CPU stall warning.  The next question is
17f2286ab9SMauro Carvalho Chehab"What caused it?"  The following problems can result in RCU CPU stall
18f2286ab9SMauro Carvalho Chehabwarnings:
19f2286ab9SMauro Carvalho Chehab
20f2286ab9SMauro Carvalho Chehab-	A CPU looping in an RCU read-side critical section.
21f2286ab9SMauro Carvalho Chehab
22f2286ab9SMauro Carvalho Chehab-	A CPU looping with interrupts disabled.
23f2286ab9SMauro Carvalho Chehab
24f2286ab9SMauro Carvalho Chehab-	A CPU looping with preemption disabled.
25f2286ab9SMauro Carvalho Chehab
26f2286ab9SMauro Carvalho Chehab-	A CPU looping with bottom halves disabled.
27f2286ab9SMauro Carvalho Chehab
283abf176dSPaul E. McKenney-	For !CONFIG_PREEMPTION kernels, a CPU looping anywhere in the
293abf176dSPaul E. McKenney	kernel without potentially invoking schedule().  If the looping
303abf176dSPaul E. McKenney	in the kernel is really expected and desirable behavior, you
313abf176dSPaul E. McKenney	might need to add some calls to cond_resched().
32f2286ab9SMauro Carvalho Chehab
33f2286ab9SMauro Carvalho Chehab-	Booting Linux using a console connection that is too slow to
34f2286ab9SMauro Carvalho Chehab	keep up with the boot-time console-message rate.  For example,
35e3879ecdSAkira Yokosawa	a 115Kbaud serial console can be *way* too slow to keep up
36f2286ab9SMauro Carvalho Chehab	with boot-time message rates, and will frequently result in
37f2286ab9SMauro Carvalho Chehab	RCU CPU stall warning messages.  Especially if you have added
38f2286ab9SMauro Carvalho Chehab	debug printk()s.
39f2286ab9SMauro Carvalho Chehab
40f2286ab9SMauro Carvalho Chehab-	Anything that prevents RCU's grace-period kthreads from running.
41f2286ab9SMauro Carvalho Chehab	This can result in the "All QSes seen" console-log message.
42f2286ab9SMauro Carvalho Chehab	This message will include information on when the kthread last
43f2286ab9SMauro Carvalho Chehab	ran and how often it should be expected to run.  It can also
44f2286ab9SMauro Carvalho Chehab	result in the ``rcu_.*kthread starved for`` console-log message,
45f2286ab9SMauro Carvalho Chehab	which will include additional debugging information.
46f2286ab9SMauro Carvalho Chehab
4781ad58beSSebastian Andrzej Siewior-	A CPU-bound real-time task in a CONFIG_PREEMPTION kernel, which might
48f2286ab9SMauro Carvalho Chehab	happen to preempt a low-priority task in the middle of an RCU
49f2286ab9SMauro Carvalho Chehab	read-side critical section.   This is especially damaging if
50f2286ab9SMauro Carvalho Chehab	that low-priority task is not permitted to run on any other CPU,
51f2286ab9SMauro Carvalho Chehab	in which case the next RCU grace period can never complete, which
52f2286ab9SMauro Carvalho Chehab	will eventually cause the system to run out of memory and hang.
53f2286ab9SMauro Carvalho Chehab	While the system is in the process of running itself out of
54f2286ab9SMauro Carvalho Chehab	memory, you might see stall-warning messages.
55f2286ab9SMauro Carvalho Chehab
56f2286ab9SMauro Carvalho Chehab-	A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
57f2286ab9SMauro Carvalho Chehab	is running at a higher priority than the RCU softirq threads.
58f2286ab9SMauro Carvalho Chehab	This will prevent RCU callbacks from ever being invoked,
59f2286ab9SMauro Carvalho Chehab	and in a CONFIG_PREEMPT_RCU kernel will further prevent
60f2286ab9SMauro Carvalho Chehab	RCU grace periods from ever completing.  Either way, the
61f2286ab9SMauro Carvalho Chehab	system will eventually run out of memory and hang.  In the
62f2286ab9SMauro Carvalho Chehab	CONFIG_PREEMPT_RCU case, you might see stall-warning
63f2286ab9SMauro Carvalho Chehab	messages.
64f2286ab9SMauro Carvalho Chehab
65f2286ab9SMauro Carvalho Chehab	You can use the rcutree.kthread_prio kernel boot parameter to
66f2286ab9SMauro Carvalho Chehab	increase the scheduling priority of RCU's kthreads, which can
67f2286ab9SMauro Carvalho Chehab	help avoid this problem.  However, please note that doing this
68f2286ab9SMauro Carvalho Chehab	can increase your system's context-switch rate and thus degrade
69f2286ab9SMauro Carvalho Chehab	performance.
70f2286ab9SMauro Carvalho Chehab
71f2286ab9SMauro Carvalho Chehab-	A periodic interrupt whose handler takes longer than the time
72f2286ab9SMauro Carvalho Chehab	interval between successive pairs of interrupts.  This can
73f2286ab9SMauro Carvalho Chehab	prevent RCU's kthreads and softirq handlers from running.
74f2286ab9SMauro Carvalho Chehab	Note that certain high-overhead debugging options, for example
75f2286ab9SMauro Carvalho Chehab	the function_graph tracer, can result in interrupt handler taking
76f2286ab9SMauro Carvalho Chehab	considerably longer than normal, which can in turn result in
77f2286ab9SMauro Carvalho Chehab	RCU CPU stall warnings.
78f2286ab9SMauro Carvalho Chehab
79f2286ab9SMauro Carvalho Chehab-	Testing a workload on a fast system, tuning the stall-warning
80f2286ab9SMauro Carvalho Chehab	timeout down to just barely avoid RCU CPU stall warnings, and then
81f2286ab9SMauro Carvalho Chehab	running the same workload with the same stall-warning timeout on a
82f2286ab9SMauro Carvalho Chehab	slow system.  Note that thermal throttling and on-demand governors
83f2286ab9SMauro Carvalho Chehab	can cause a single system to be sometimes fast and sometimes slow!
84f2286ab9SMauro Carvalho Chehab
85f2286ab9SMauro Carvalho Chehab-	A hardware or software issue shuts off the scheduler-clock
86f2286ab9SMauro Carvalho Chehab	interrupt on a CPU that is not in dyntick-idle mode.  This
87f2286ab9SMauro Carvalho Chehab	problem really has happened, and seems to be most likely to
88f2286ab9SMauro Carvalho Chehab	result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels.
89f2286ab9SMauro Carvalho Chehab
90b81898e3SPaul E. McKenney-	A hardware or software issue that prevents time-based wakeups
91b81898e3SPaul E. McKenney	from occurring.  These issues can range from misconfigured or
92b81898e3SPaul E. McKenney	buggy timer hardware through bugs in the interrupt or exception
93b81898e3SPaul E. McKenney	path (whether hardware, firmware, or software) through bugs
94b81898e3SPaul E. McKenney	in Linux's timer subsystem through bugs in the scheduler, and,
95683954e5SNeeraj Upadhyay	yes, even including bugs in RCU itself.  It can also result in
96683954e5SNeeraj Upadhyay	the ``rcu_.*timer wakeup didn't happen for`` console-log message,
97683954e5SNeeraj Upadhyay	which will include additional debugging information.
98b81898e3SPaul E. McKenney
9913bc8fa8SPaul E. McKenney-	A low-level kernel issue that either fails to invoke one of the
100c33ef43aSFrederic Weisbecker	variants of rcu_eqs_enter(true), rcu_eqs_exit(true), ct_idle_enter(),
1016f0e6c15SFrederic Weisbecker	ct_idle_exit(), ct_irq_enter(), or ct_irq_exit() on the one
10213bc8fa8SPaul E. McKenney	hand, or that invokes one of them too many times on the other.
10313bc8fa8SPaul E. McKenney	Historically, the most frequent issue has been an omission
10413bc8fa8SPaul E. McKenney	of either irq_enter() or irq_exit(), which in turn invoke
1056f0e6c15SFrederic Weisbecker	ct_irq_enter() or ct_irq_exit(), respectively.  Building your
10613bc8fa8SPaul E. McKenney	kernel with CONFIG_RCU_EQS_DEBUG=y can help track down these types
10713bc8fa8SPaul E. McKenney	of issues, which sometimes arise in architecture-specific code.
10813bc8fa8SPaul E. McKenney
109f2286ab9SMauro Carvalho Chehab-	A bug in the RCU implementation.
110f2286ab9SMauro Carvalho Chehab
1113abf176dSPaul E. McKenney-	A hardware failure.  This is quite unlikely, but is not at all
1123abf176dSPaul E. McKenney	uncommon in large datacenter.  In one memorable case some decades
1133abf176dSPaul E. McKenney	back, a CPU failed in a running system, becoming unresponsive,
1143abf176dSPaul E. McKenney	but not causing an immediate crash.  This resulted in a series
1153abf176dSPaul E. McKenney	of RCU CPU stall warnings, eventually leading the realization
1163abf176dSPaul E. McKenney	that the CPU had failed.
117f2286ab9SMauro Carvalho Chehab
1183abf176dSPaul E. McKenneyThe RCU, RCU-sched, RCU-tasks, and RCU-tasks-trace implementations have
1193abf176dSPaul E. McKenneyCPU stall warning.  Note that SRCU does *not* have CPU stall warnings.
1203abf176dSPaul E. McKenneyPlease note that RCU only detects CPU stalls when there is a grace period
1213abf176dSPaul E. McKenneyin progress.  No grace period, no CPU stall warnings.
122f2286ab9SMauro Carvalho Chehab
123f2286ab9SMauro Carvalho ChehabTo diagnose the cause of the stall, inspect the stack traces.
124f2286ab9SMauro Carvalho ChehabThe offending function will usually be near the top of the stack.
125f2286ab9SMauro Carvalho ChehabIf you have a series of stall warnings from a single extended stall,
126f2286ab9SMauro Carvalho Chehabcomparing the stack traces can often help determine where the stall
127f2286ab9SMauro Carvalho Chehabis occurring, which will usually be in the function nearest the top of
128f2286ab9SMauro Carvalho Chehabthat portion of the stack which remains the same from trace to trace.
129f2286ab9SMauro Carvalho ChehabIf you can reliably trigger the stall, ftrace can be quite helpful.
130f2286ab9SMauro Carvalho Chehab
131f2286ab9SMauro Carvalho ChehabRCU bugs can often be debugged with the help of CONFIG_RCU_TRACE
132f2286ab9SMauro Carvalho Chehaband with RCU's event tracing.  For information on RCU's event tracing,
133f2286ab9SMauro Carvalho Chehabsee include/trace/events/rcu.h.
134f2286ab9SMauro Carvalho Chehab
135f2286ab9SMauro Carvalho Chehab
136f2286ab9SMauro Carvalho ChehabFine-Tuning the RCU CPU Stall Detector
137f2286ab9SMauro Carvalho Chehab======================================
138f2286ab9SMauro Carvalho Chehab
139f2286ab9SMauro Carvalho ChehabThe rcuupdate.rcu_cpu_stall_suppress module parameter disables RCU's
140f2286ab9SMauro Carvalho ChehabCPU stall detector, which detects conditions that unduly delay RCU grace
141f2286ab9SMauro Carvalho Chehabperiods.  This module parameter enables CPU stall detection by default,
142f2286ab9SMauro Carvalho Chehabbut may be overridden via boot-time parameter or at runtime via sysfs.
143f2286ab9SMauro Carvalho ChehabThe stall detector's idea of what constitutes "unduly delayed" is
144f2286ab9SMauro Carvalho Chehabcontrolled by a set of kernel configuration variables and cpp macros:
145f2286ab9SMauro Carvalho Chehab
146f2286ab9SMauro Carvalho ChehabCONFIG_RCU_CPU_STALL_TIMEOUT
147f2286ab9SMauro Carvalho Chehab----------------------------
148f2286ab9SMauro Carvalho Chehab
149f2286ab9SMauro Carvalho Chehab	This kernel configuration parameter defines the period of time
150f2286ab9SMauro Carvalho Chehab	that RCU will wait from the beginning of a grace period until it
151f2286ab9SMauro Carvalho Chehab	issues an RCU CPU stall warning.  This time period is normally
152f2286ab9SMauro Carvalho Chehab	21 seconds.
153f2286ab9SMauro Carvalho Chehab
154f2286ab9SMauro Carvalho Chehab	This configuration parameter may be changed at runtime via the
155f2286ab9SMauro Carvalho Chehab	/sys/module/rcupdate/parameters/rcu_cpu_stall_timeout, however
156f2286ab9SMauro Carvalho Chehab	this parameter is checked only at the beginning of a cycle.
157f2286ab9SMauro Carvalho Chehab	So if you are 10 seconds into a 40-second stall, setting this
158f2286ab9SMauro Carvalho Chehab	sysfs parameter to (say) five will shorten the timeout for the
159e3879ecdSAkira Yokosawa	*next* stall, or the following warning for the current stall
160f2286ab9SMauro Carvalho Chehab	(assuming the stall lasts long enough).  It will not affect the
161f2286ab9SMauro Carvalho Chehab	timing of the next warning for the current stall.
162f2286ab9SMauro Carvalho Chehab
163f2286ab9SMauro Carvalho Chehab	Stall-warning messages may be enabled and disabled completely via
164f2286ab9SMauro Carvalho Chehab	/sys/module/rcupdate/parameters/rcu_cpu_stall_suppress.
165f2286ab9SMauro Carvalho Chehab
16628b3ae42SUladzislau RezkiCONFIG_RCU_EXP_CPU_STALL_TIMEOUT
16728b3ae42SUladzislau Rezki--------------------------------
16828b3ae42SUladzislau Rezki
16928b3ae42SUladzislau Rezki	Same as the CONFIG_RCU_CPU_STALL_TIMEOUT parameter but only for
17028b3ae42SUladzislau Rezki	the expedited grace period. This parameter defines the period
17128b3ae42SUladzislau Rezki	of time that RCU will wait from the beginning of an expedited
17228b3ae42SUladzislau Rezki	grace period until it issues an RCU CPU stall warning. This time
17328b3ae42SUladzislau Rezki	period is normally 20 milliseconds on Android devices.	A zero
17428b3ae42SUladzislau Rezki	value causes the CONFIG_RCU_CPU_STALL_TIMEOUT value to be used,
17528b3ae42SUladzislau Rezki	after conversion to milliseconds.
17628b3ae42SUladzislau Rezki
17728b3ae42SUladzislau Rezki	This configuration parameter may be changed at runtime via the
17828b3ae42SUladzislau Rezki	/sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout, however
17928b3ae42SUladzislau Rezki	this parameter is checked only at the beginning of a cycle. If you
18028b3ae42SUladzislau Rezki	are in a current stall cycle, setting it to a new value will change
18128b3ae42SUladzislau Rezki	the timeout for the -next- stall.
18228b3ae42SUladzislau Rezki
18328b3ae42SUladzislau Rezki	Stall-warning messages may be enabled and disabled completely via
18428b3ae42SUladzislau Rezki	/sys/module/rcupdate/parameters/rcu_cpu_stall_suppress.
18528b3ae42SUladzislau Rezki
186f2286ab9SMauro Carvalho ChehabRCU_STALL_DELAY_DELTA
187f2286ab9SMauro Carvalho Chehab---------------------
188f2286ab9SMauro Carvalho Chehab
189f2286ab9SMauro Carvalho Chehab	Although the lockdep facility is extremely useful, it does add
190f2286ab9SMauro Carvalho Chehab	some overhead.  Therefore, under CONFIG_PROVE_RCU, the
191f2286ab9SMauro Carvalho Chehab	RCU_STALL_DELAY_DELTA macro allows five extra seconds before
192f2286ab9SMauro Carvalho Chehab	giving an RCU CPU stall warning message.  (This is a cpp
193f2286ab9SMauro Carvalho Chehab	macro, not a kernel configuration parameter.)
194f2286ab9SMauro Carvalho Chehab
195f2286ab9SMauro Carvalho ChehabRCU_STALL_RAT_DELAY
196f2286ab9SMauro Carvalho Chehab-------------------
197f2286ab9SMauro Carvalho Chehab
198f2286ab9SMauro Carvalho Chehab	The CPU stall detector tries to make the offending CPU print its
199f2286ab9SMauro Carvalho Chehab	own warnings, as this often gives better-quality stack traces.
200f2286ab9SMauro Carvalho Chehab	However, if the offending CPU does not detect its own stall in
201f2286ab9SMauro Carvalho Chehab	the number of jiffies specified by RCU_STALL_RAT_DELAY, then
202f2286ab9SMauro Carvalho Chehab	some other CPU will complain.  This delay is normally set to
203f2286ab9SMauro Carvalho Chehab	two jiffies.  (This is a cpp macro, not a kernel configuration
204f2286ab9SMauro Carvalho Chehab	parameter.)
205f2286ab9SMauro Carvalho Chehab
206f2286ab9SMauro Carvalho Chehabrcupdate.rcu_task_stall_timeout
207f2286ab9SMauro Carvalho Chehab-------------------------------
208f2286ab9SMauro Carvalho Chehab
2093abf176dSPaul E. McKenney	This boot/sysfs parameter controls the RCU-tasks and
2103abf176dSPaul E. McKenney	RCU-tasks-trace stall warning intervals.  A value of zero or less
2113abf176dSPaul E. McKenney	suppresses RCU-tasks stall warnings.  A positive value sets the
2123abf176dSPaul E. McKenney	stall-warning interval in seconds.  An RCU-tasks stall warning
2133abf176dSPaul E. McKenney	starts with the line:
214f2286ab9SMauro Carvalho Chehab
215f2286ab9SMauro Carvalho Chehab		INFO: rcu_tasks detected stalls on tasks:
216f2286ab9SMauro Carvalho Chehab
217f2286ab9SMauro Carvalho Chehab	And continues with the output of sched_show_task() for each
218f2286ab9SMauro Carvalho Chehab	task stalling the current RCU-tasks grace period.
219f2286ab9SMauro Carvalho Chehab
2203abf176dSPaul E. McKenney	An RCU-tasks-trace stall warning starts (and continues) similarly:
2213abf176dSPaul E. McKenney
2223abf176dSPaul E. McKenney		INFO: rcu_tasks_trace detected stalls on tasks
2233abf176dSPaul E. McKenney
224f2286ab9SMauro Carvalho Chehab
225f2286ab9SMauro Carvalho ChehabInterpreting RCU's CPU Stall-Detector "Splats"
226f2286ab9SMauro Carvalho Chehab==============================================
227f2286ab9SMauro Carvalho Chehab
22899c0974fSPaul E. McKenneyFor non-RCU-tasks flavors of RCU, when a CPU detects that some other
22999c0974fSPaul E. McKenneyCPU is stalling, it will print a message similar to the following::
230f2286ab9SMauro Carvalho Chehab
231f2286ab9SMauro Carvalho Chehab	INFO: rcu_sched detected stalls on CPUs/tasks:
232f2286ab9SMauro Carvalho Chehab	2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
233f2286ab9SMauro Carvalho Chehab	16-...: (0 ticks this GP) idle=81c/0/0 softirq=764/764 fqs=0
234f2286ab9SMauro Carvalho Chehab	(detected by 32, t=2603 jiffies, g=7075, q=625)
235f2286ab9SMauro Carvalho Chehab
236f2286ab9SMauro Carvalho ChehabThis message indicates that CPU 32 detected that CPUs 2 and 16 were both
237f2286ab9SMauro Carvalho Chehabcausing stalls, and that the stall was affecting RCU-sched.  This message
238f2286ab9SMauro Carvalho Chehabwill normally be followed by stack dumps for each CPU.  Please note that
239f2286ab9SMauro Carvalho ChehabPREEMPT_RCU builds can be stalled by tasks as well as by CPUs, and that
240f2286ab9SMauro Carvalho Chehabthe tasks will be indicated by PID, for example, "P3421".  It is even
241e3879ecdSAkira Yokosawapossible for an rcu_state stall to be caused by both CPUs *and* tasks,
242f2286ab9SMauro Carvalho Chehabin which case the offending CPUs and tasks will all be called out in the list.
24399c0974fSPaul E. McKenneyIn some cases, CPUs will detect themselves stalling, which will result
24499c0974fSPaul E. McKenneyin a self-detected stall.
245f2286ab9SMauro Carvalho Chehab
246f2286ab9SMauro Carvalho ChehabCPU 2's "(3 GPs behind)" indicates that this CPU has not interacted with
247f2286ab9SMauro Carvalho Chehabthe RCU core for the past three grace periods.  In contrast, CPU 16's "(0
248f2286ab9SMauro Carvalho Chehabticks this GP)" indicates that this CPU has not taken any scheduling-clock
249f2286ab9SMauro Carvalho Chehabinterrupts during the current stalled grace period.
250f2286ab9SMauro Carvalho Chehab
251f2286ab9SMauro Carvalho ChehabThe "idle=" portion of the message prints the dyntick-idle state.
252f2286ab9SMauro Carvalho ChehabThe hex number before the first "/" is the low-order 12 bits of the
253f2286ab9SMauro Carvalho Chehabdynticks counter, which will have an even-numbered value if the CPU
254f2286ab9SMauro Carvalho Chehabis in dyntick-idle mode and an odd-numbered value otherwise.  The hex
255f2286ab9SMauro Carvalho Chehabnumber between the two "/"s is the value of the nesting, which will be
256f2286ab9SMauro Carvalho Chehaba small non-negative number if in the idle loop (as shown above) and a
2573abf176dSPaul E. McKenneyvery large positive number otherwise.  The number following the final
2583abf176dSPaul E. McKenney"/" is the NMI nesting, which will be a small non-negative number.
259f2286ab9SMauro Carvalho Chehab
260f2286ab9SMauro Carvalho ChehabThe "softirq=" portion of the message tracks the number of RCU softirq
261f2286ab9SMauro Carvalho Chehabhandlers that the stalled CPU has executed.  The number before the "/"
262f2286ab9SMauro Carvalho Chehabis the number that had executed since boot at the time that this CPU
263f2286ab9SMauro Carvalho Chehablast noted the beginning of a grace period, which might be the current
264f2286ab9SMauro Carvalho Chehab(stalled) grace period, or it might be some earlier grace period (for
265f2286ab9SMauro Carvalho Chehabexample, if the CPU might have been in dyntick-idle mode for an extended
2669984fd7eSHaocheng Xietime period).  The number after the "/" is the number that have executed
267f2286ab9SMauro Carvalho Chehabsince boot until the current time.  If this latter number stays constant
268f2286ab9SMauro Carvalho Chehabacross repeated stall-warning messages, it is possible that RCU's softirq
269f2286ab9SMauro Carvalho Chehabhandlers are no longer able to execute on this CPU.  This can happen if
270f2286ab9SMauro Carvalho Chehabthe stalled CPU is spinning with interrupts are disabled, or, in -rt
271f2286ab9SMauro Carvalho Chehabkernels, if a high-priority process is starving RCU's softirq handler.
272f2286ab9SMauro Carvalho Chehab
273f2286ab9SMauro Carvalho ChehabThe "fqs=" shows the number of force-quiescent-state idle/offline
274f2286ab9SMauro Carvalho Chehabdetection passes that the grace-period kthread has made across this
275f2286ab9SMauro Carvalho ChehabCPU since the last time that this CPU noted the beginning of a grace
276f2286ab9SMauro Carvalho Chehabperiod.
277f2286ab9SMauro Carvalho Chehab
278f2286ab9SMauro Carvalho ChehabThe "detected by" line indicates which CPU detected the stall (in this
279f2286ab9SMauro Carvalho Chehabcase, CPU 32), how many jiffies have elapsed since the start of the grace
280f2286ab9SMauro Carvalho Chehabperiod (in this case 2603), the grace-period sequence number (7075), and
281f2286ab9SMauro Carvalho Chehaban estimate of the total number of RCU callbacks queued across all CPUs
282f2286ab9SMauro Carvalho Chehab(625 in this case).
283f2286ab9SMauro Carvalho Chehab
284f2286ab9SMauro Carvalho ChehabIf the grace period ends just as the stall warning starts printing,
285f2286ab9SMauro Carvalho Chehabthere will be a spurious stall-warning message, which will include
286f2286ab9SMauro Carvalho Chehabthe following::
287f2286ab9SMauro Carvalho Chehab
288f2286ab9SMauro Carvalho Chehab	INFO: Stall ended before state dump start
289f2286ab9SMauro Carvalho Chehab
290f2286ab9SMauro Carvalho ChehabThis is rare, but does happen from time to time in real life.  It is also
291f2286ab9SMauro Carvalho Chehabpossible for a zero-jiffy stall to be flagged in this case, depending
292f2286ab9SMauro Carvalho Chehabon how the stall warning and the grace-period initialization happen to
293f2286ab9SMauro Carvalho Chehabinteract.  Please note that it is not possible to entirely eliminate this
294f2286ab9SMauro Carvalho Chehabsort of false positive without resorting to things like stop_machine(),
295f2286ab9SMauro Carvalho Chehabwhich is overkill for this sort of problem.
296f2286ab9SMauro Carvalho Chehab
297f2286ab9SMauro Carvalho ChehabIf all CPUs and tasks have passed through quiescent states, but the
298f2286ab9SMauro Carvalho Chehabgrace period has nevertheless failed to end, the stall-warning splat
299f2286ab9SMauro Carvalho Chehabwill include something like the following::
300f2286ab9SMauro Carvalho Chehab
301f2286ab9SMauro Carvalho Chehab	All QSes seen, last rcu_preempt kthread activity 23807 (4297905177-4297881370), jiffies_till_next_fqs=3, root ->qsmask 0x0
302f2286ab9SMauro Carvalho Chehab
303f2286ab9SMauro Carvalho ChehabThe "23807" indicates that it has been more than 23 thousand jiffies
304f2286ab9SMauro Carvalho Chehabsince the grace-period kthread ran.  The "jiffies_till_next_fqs"
305f2286ab9SMauro Carvalho Chehabindicates how frequently that kthread should run, giving the number
306f2286ab9SMauro Carvalho Chehabof jiffies between force-quiescent-state scans, in this case three,
307f2286ab9SMauro Carvalho Chehabwhich is way less than 23807.  Finally, the root rcu_node structure's
308f2286ab9SMauro Carvalho Chehab->qsmask field is printed, which will normally be zero.
309f2286ab9SMauro Carvalho Chehab
310f2286ab9SMauro Carvalho ChehabIf the relevant grace-period kthread has been unable to run prior to
311f2286ab9SMauro Carvalho Chehabthe stall warning, as was the case in the "All QSes seen" line above,
312f2286ab9SMauro Carvalho Chehabthe following additional line is printed::
313f2286ab9SMauro Carvalho Chehab
31499c0974fSPaul E. McKenney	rcu_sched kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5
31599c0974fSPaul E. McKenney	Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
316f2286ab9SMauro Carvalho Chehab
317f2286ab9SMauro Carvalho ChehabStarving the grace-period kthreads of CPU time can of course result
318f2286ab9SMauro Carvalho Chehabin RCU CPU stall warnings even when all CPUs and tasks have passed
319f2286ab9SMauro Carvalho Chehabthrough the required quiescent states.  The "g" number shows the current
320f2286ab9SMauro Carvalho Chehabgrace-period sequence number, the "f" precedes the ->gp_flags command
321f2286ab9SMauro Carvalho Chehabto the grace-period kthread, the "RCU_GP_WAIT_FQS" indicates that the
322f2286ab9SMauro Carvalho Chehabkthread is waiting for a short timeout, the "state" precedes value of the
323f2286ab9SMauro Carvalho Chehabtask_struct ->state field, and the "cpu" indicates that the grace-period
324f2286ab9SMauro Carvalho Chehabkthread last ran on CPU 5.
325f2286ab9SMauro Carvalho Chehab
326683954e5SNeeraj UpadhyayIf the relevant grace-period kthread does not wake from FQS wait in a
327683954e5SNeeraj Upadhyayreasonable time, then the following additional line is printed::
328683954e5SNeeraj Upadhyay
329683954e5SNeeraj Upadhyay	kthread timer wakeup didn't happen for 23804 jiffies! g7076 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
330683954e5SNeeraj Upadhyay
331683954e5SNeeraj UpadhyayThe "23804" indicates that kthread's timer expired more than 23 thousand
332683954e5SNeeraj Upadhyayjiffies ago.  The rest of the line has meaning similar to the kthread
333683954e5SNeeraj Upadhyaystarvation case.
334683954e5SNeeraj Upadhyay
335683954e5SNeeraj UpadhyayAdditionally, the following line is printed::
336683954e5SNeeraj Upadhyay
337683954e5SNeeraj Upadhyay	Possible timer handling issue on cpu=4 timer-softirq=11142
338683954e5SNeeraj Upadhyay
339683954e5SNeeraj UpadhyayHere "cpu" indicates that the grace-period kthread last ran on CPU 4,
340683954e5SNeeraj Upadhyaywhere it queued the fqs timer.  The number following the "timer-softirq"
341683954e5SNeeraj Upadhyayis the current ``TIMER_SOFTIRQ`` count on cpu 4.  If this value does not
342683954e5SNeeraj Upadhyaychange on successive RCU CPU stall warnings, there is further reason to
343683954e5SNeeraj Upadhyaysuspect a timer problem.
344683954e5SNeeraj Upadhyay
34599c0974fSPaul E. McKenneyThese messages are usually followed by stack dumps of the CPUs and tasks
34699c0974fSPaul E. McKenneyinvolved in the stall.  These stack traces can help you locate the cause
34799c0974fSPaul E. McKenneyof the stall, keeping in mind that the CPU detecting the stall will have
34899c0974fSPaul E. McKenneyan interrupt frame that is mainly devoted to detecting the stall.
34999c0974fSPaul E. McKenney
350f2286ab9SMauro Carvalho Chehab
351f2286ab9SMauro Carvalho ChehabMultiple Warnings From One Stall
352f2286ab9SMauro Carvalho Chehab================================
353f2286ab9SMauro Carvalho Chehab
35499c0974fSPaul E. McKenneyIf a stall lasts long enough, multiple stall-warning messages will
35599c0974fSPaul E. McKenneybe printed for it.  The second and subsequent messages are printed at
356f2286ab9SMauro Carvalho Chehablonger intervals, so that the time between (say) the first and second
357f2286ab9SMauro Carvalho Chehabmessage will be about three times the interval between the beginning
35899c0974fSPaul E. McKenneyof the stall and the first message.  It can be helpful to compare the
35999c0974fSPaul E. McKenneystack dumps for the different messages for the same stalled grace period.
360f2286ab9SMauro Carvalho Chehab
361f2286ab9SMauro Carvalho Chehab
362f2286ab9SMauro Carvalho ChehabStall Warnings for Expedited Grace Periods
363f2286ab9SMauro Carvalho Chehab==========================================
364f2286ab9SMauro Carvalho Chehab
365f2286ab9SMauro Carvalho ChehabIf an expedited grace period detects a stall, it will place a message
366f2286ab9SMauro Carvalho Chehablike the following in dmesg::
367f2286ab9SMauro Carvalho Chehab
368f2286ab9SMauro Carvalho Chehab	INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 21119 jiffies s: 73 root: 0x2/.
369f2286ab9SMauro Carvalho Chehab
370f2286ab9SMauro Carvalho ChehabThis indicates that CPU 7 has failed to respond to a reschedule IPI.
371f2286ab9SMauro Carvalho ChehabThe three periods (".") following the CPU number indicate that the CPU
372f2286ab9SMauro Carvalho Chehabis online (otherwise the first period would instead have been "O"),
373f2286ab9SMauro Carvalho Chehabthat the CPU was online at the beginning of the expedited grace period
374f2286ab9SMauro Carvalho Chehab(otherwise the second period would have instead been "o"), and that
375f2286ab9SMauro Carvalho Chehabthe CPU has been online at least once since boot (otherwise, the third
376f2286ab9SMauro Carvalho Chehabperiod would instead have been "N").  The number before the "jiffies"
377f2286ab9SMauro Carvalho Chehabindicates that the expedited grace period has been going on for 21,119
378f2286ab9SMauro Carvalho Chehabjiffies.  The number following the "s:" indicates that the expedited
379f2286ab9SMauro Carvalho Chehabgrace-period sequence counter is 73.  The fact that this last value is
380f2286ab9SMauro Carvalho Chehabodd indicates that an expedited grace period is in flight.  The number
381f2286ab9SMauro Carvalho Chehabfollowing "root:" is a bitmask that indicates which children of the root
382f2286ab9SMauro Carvalho Chehabrcu_node structure correspond to CPUs and/or tasks that are blocking the
383f2286ab9SMauro Carvalho Chehabcurrent expedited grace period.  If the tree had more than one level,
384f2286ab9SMauro Carvalho Chehabadditional hex numbers would be printed for the states of the other
385f2286ab9SMauro Carvalho Chehabrcu_node structures in the tree.
386f2286ab9SMauro Carvalho Chehab
387f2286ab9SMauro Carvalho ChehabAs with normal grace periods, PREEMPT_RCU builds can be stalled by
388f2286ab9SMauro Carvalho Chehabtasks as well as by CPUs, and that the tasks will be indicated by PID,
389f2286ab9SMauro Carvalho Chehabfor example, "P3421".
390f2286ab9SMauro Carvalho Chehab
391f2286ab9SMauro Carvalho ChehabIt is entirely possible to see stall warnings from normal and from
392f2286ab9SMauro Carvalho Chehabexpedited grace periods at about the same time during the same run.
3937a21ddf0SZhen Lei
3947a21ddf0SZhen LeiRCU_CPU_STALL_CPUTIME
3957a21ddf0SZhen Lei=====================
3967a21ddf0SZhen Lei
3977a21ddf0SZhen LeiIn kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y or booted with
3987a21ddf0SZhen Leircupdate.rcu_cpu_stall_cputime=1, the following additional information
3997a21ddf0SZhen Leiis supplied with each RCU CPU stall warning::
4007a21ddf0SZhen Lei
4017a21ddf0SZhen Lei  rcu:          hardirqs   softirqs   csw/system
4027a21ddf0SZhen Lei  rcu:  number:      624         45            0
4037a21ddf0SZhen Lei  rcu: cputime:       69          1         2425   ==> 2500(ms)
4047a21ddf0SZhen Lei
4057a21ddf0SZhen LeiThese statistics are collected during the sampling period. The values
4067a21ddf0SZhen Leiin row "number:" are the number of hard interrupts, number of soft
4077a21ddf0SZhen Leiinterrupts, and number of context switches on the stalled CPU. The
4087a21ddf0SZhen Leifirst three values in row "cputime:" indicate the CPU time in
4097a21ddf0SZhen Leimilliseconds consumed by hard interrupts, soft interrupts, and tasks
4107a21ddf0SZhen Leion the stalled CPU.  The last number is the measurement interval, again
4117a21ddf0SZhen Leiin milliseconds.  Because user-mode tasks normally do not cause RCU CPU
4127a21ddf0SZhen Leistalls, these tasks are typically kernel tasks, which is why only the
4137a21ddf0SZhen Leisystem CPU time are considered.
4147a21ddf0SZhen Lei
415*5e013dc1SZhen LeiThe sampling period is shown as follows::
4167a21ddf0SZhen Lei
417*5e013dc1SZhen Lei  |<------------first timeout---------->|<-----second timeout----->|
418*5e013dc1SZhen Lei  |<--half timeout-->|<--half timeout-->|                          |
419*5e013dc1SZhen Lei  |                  |<--first period-->|                          |
420*5e013dc1SZhen Lei  |                  |<-----------second sampling period---------->|
421*5e013dc1SZhen Lei  |                  |                  |                          |
422*5e013dc1SZhen Lei             snapshot time point    1st-stall                  2nd-stall
4237a21ddf0SZhen Lei
4247a21ddf0SZhen LeiThe following describes four typical scenarios:
4257a21ddf0SZhen Lei
426*5e013dc1SZhen Lei1. A CPU looping with interrupts disabled.
427*5e013dc1SZhen Lei
428*5e013dc1SZhen Lei   ::
4297a21ddf0SZhen Lei
4307a21ddf0SZhen Lei     rcu:          hardirqs   softirqs   csw/system
4317a21ddf0SZhen Lei     rcu:  number:        0          0            0
4327a21ddf0SZhen Lei     rcu: cputime:        0          0            0   ==> 2500(ms)
4337a21ddf0SZhen Lei
4347a21ddf0SZhen Lei   Because interrupts have been disabled throughout the measurement
4357a21ddf0SZhen Lei   interval, there are no interrupts and no context switches.
4367a21ddf0SZhen Lei   Furthermore, because CPU time consumption was measured using interrupt
4377a21ddf0SZhen Lei   handlers, the system CPU consumption is misleadingly measured as zero.
4387a21ddf0SZhen Lei   This scenario will normally also have "(0 ticks this GP)" printed on
4397a21ddf0SZhen Lei   this CPU's summary line.
4407a21ddf0SZhen Lei
4417a21ddf0SZhen Lei2. A CPU looping with bottom halves disabled.
4427a21ddf0SZhen Lei
4437a21ddf0SZhen Lei   This is similar to the previous example, but with non-zero number of
4447a21ddf0SZhen Lei   and CPU time consumed by hard interrupts, along with non-zero CPU
445*5e013dc1SZhen Lei   time consumed by in-kernel execution::
4467a21ddf0SZhen Lei
4477a21ddf0SZhen Lei     rcu:          hardirqs   softirqs   csw/system
4487a21ddf0SZhen Lei     rcu:  number:      624          0            0
4497a21ddf0SZhen Lei     rcu: cputime:       49          0         2446   ==> 2500(ms)
4507a21ddf0SZhen Lei
4517a21ddf0SZhen Lei   The fact that there are zero softirqs gives a hint that these were
4527a21ddf0SZhen Lei   disabled, perhaps via local_bh_disable().  It is of course possible
4537a21ddf0SZhen Lei   that there were no softirqs, perhaps because all events that would
4547a21ddf0SZhen Lei   result in softirq execution are confined to other CPUs.  In this case,
4557a21ddf0SZhen Lei   the diagnosis should continue as shown in the next example.
4567a21ddf0SZhen Lei
4577a21ddf0SZhen Lei3. A CPU looping with preemption disabled.
4587a21ddf0SZhen Lei
459*5e013dc1SZhen Lei   Here, only the number of context switches is zero::
4607a21ddf0SZhen Lei
4617a21ddf0SZhen Lei     rcu:          hardirqs   softirqs   csw/system
4627a21ddf0SZhen Lei     rcu:  number:      624         45            0
4637a21ddf0SZhen Lei     rcu: cputime:       69          1         2425   ==> 2500(ms)
4647a21ddf0SZhen Lei
4657a21ddf0SZhen Lei   This situation hints that the stalled CPU was looping with preemption
4667a21ddf0SZhen Lei   disabled.
4677a21ddf0SZhen Lei
468*5e013dc1SZhen Lei4. No looping, but massive hard and soft interrupts.
469*5e013dc1SZhen Lei
470*5e013dc1SZhen Lei   ::
4717a21ddf0SZhen Lei
4727a21ddf0SZhen Lei     rcu:          hardirqs   softirqs   csw/system
4737a21ddf0SZhen Lei     rcu:  number:       xx         xx            0
4747a21ddf0SZhen Lei     rcu: cputime:       xx         xx            0   ==> 2500(ms)
4757a21ddf0SZhen Lei
4767a21ddf0SZhen Lei   Here, the number and CPU time of hard interrupts are all non-zero,
4777a21ddf0SZhen Lei   but the number of context switches and the in-kernel CPU time consumed
4787a21ddf0SZhen Lei   are zero. The number and cputime of soft interrupts will usually be
4797a21ddf0SZhen Lei   non-zero, but could be zero, for example, if the CPU was spinning
4807a21ddf0SZhen Lei   within a single hard interrupt handler.
4817a21ddf0SZhen Lei
4827a21ddf0SZhen Lei   If this type of RCU CPU stall warning can be reproduced, you can
4837a21ddf0SZhen Lei   narrow it down by looking at /proc/interrupts or by writing code to
4847a21ddf0SZhen Lei   trace each interrupt, for example, by referring to show_interrupts().
485