#
75060b6e |
| 06-Mar-2024 |
Thomas Weißschuh <linux@weissschuh.net> |
watchdog/core: remove sysctl handlers from public header
The functions are only used in the file where they are defined. Remove them from the header and make them static.
Also guard proc_soft_watc
watchdog/core: remove sysctl handlers from public header
The functions are only used in the file where they are defined. Remove them from the header and make them static.
Also guard proc_soft_watchdog with a #define-guard as it is not used otherwise.
Link: https://lkml.kernel.org/r/20240306-const-sysctl-prep-watchdog-v1-1-bd45da3a41cf@weissschuh.net Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
55efe4ab |
| 20-Dec-2023 |
Douglas Anderson <dianders@chromium.org> |
watchdog: if panicking and we dumped everything, don't re-enable dumping
If, as part of handling a hardlockup or softlockup, we've already dumped all CPUs and we're just about to panic, don't reenab
watchdog: if panicking and we dumped everything, don't re-enable dumping
If, as part of handling a hardlockup or softlockup, we've already dumped all CPUs and we're just about to panic, don't reenable dumping and give some other CPU a chance to hop in there and add some confusing logs right as the panic is happening.
Link: https://lkml.kernel.org/r/20231220131534.4.Id3a9c7ec2d7d83e4080da6f8662ba2226b40543f@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
ee6bdb3f |
| 20-Dec-2023 |
Douglas Anderson <dianders@chromium.org> |
watchdog/hardlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
If two CPUs end up reporting a hardlockup at the same time then their logs could get interleaved which is hard to read.
watchdog/hardlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
If two CPUs end up reporting a hardlockup at the same time then their logs could get interleaved which is hard to read.
The interleaving problem was especially bad with the "perf" hardlockup detector where the locked up CPU is always the same as the running CPU and we end up in show_regs(). show_regs() has no inherent serialization so we could mix together two crawls if two hardlockups happened at the same time (and if we didn't have `sysctl_hardlockup_all_cpu_backtrace` set). With this change we'll fully serialize hardlockups when using the "perf" hardlockup detector.
The interleaving problem was less bad with the "buddy" hardlockup detector. With "buddy" we always end up calling `trigger_single_cpu_backtrace(cpu)` on some CPU other than the running one. trigger_single_cpu_backtrace() always at least serializes the individual stack crawls because it eventually uses printk_cpu_sync_get_irqsave(). Unfortunately the fact that trigger_single_cpu_backtrace() eventually calls printk_cpu_sync_get_irqsave() (on a different CPU) means that we have to drop the "lock" before calling it and we can't fully serialize all printouts associated with a given hardlockup. However, we still do get the advantage of serializing the output of print_modules() and print_irqtrace_events().
Aside from serializing hardlockups from each other, this change also has the advantage of serializing hardlockups and softlockups from each other if they happen to happen at the same time since they are both using the same "lock".
Even though nobody is expected to hang while holding the lock associated with printk_cpu_sync_get_irqsave(), out of an abundance of caution, we don't call printk_cpu_sync_get_irqsave() until after we print out about the hardlockup. This makes extra sure that, even if printk_cpu_sync_get_irqsave() somehow never runs we at least print that we saw the hardlockup. This is different than the choice made for softlockup because hardlockup is really our last resort.
Link: https://lkml.kernel.org/r/20231220131534.3.I6ff691b3b40f0379bc860f80c6e729a0485b5247@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: John Ogness <john.ogness@linutronix.de> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
896260a6 |
| 20-Dec-2023 |
Douglas Anderson <dianders@chromium.org> |
watchdog/softlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
Instead of introducing a spinlock, use printk_cpu_sync_get_irqsave() and printk_cpu_sync_put_irqrestore() to serialize s
watchdog/softlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
Instead of introducing a spinlock, use printk_cpu_sync_get_irqsave() and printk_cpu_sync_put_irqrestore() to serialize softlockup reporting. Alone this doesn't have any real advantage over the spinlock, but this will allow us to use the same function in a future change to also serialize hardlockup crawls.
NOTE: for the most part this serialization is important because we often end up in the show_regs() path and that has no built-in serialization if there are multiple callers at once. However, even in the case where we end up in the dump_stack() path this still has some advantages because the stack will be guaranteed to be together in the logs with the lockup message with no interleaving.
NOTE: the fact that printk_cpu_sync_get_irqsave() is allowed to be called multiple times on the same CPU is important here. Specifically we hold the "lock" while calling dump_stack() which also gets the same "lock". This is explicitly documented to be OK and means we don't need to introduce a variant of dump_stack() that doesn't grab the lock.
Link: https://lkml.kernel.org/r/20231220131534.2.Ia5906525d440d8e8383cde31b7c61c2aadc8f907@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Li Zhe <lizhe.67@bytedance.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
6dcde5d5 |
| 20-Dec-2023 |
Douglas Anderson <dianders@chromium.org> |
watchdog/hardlockup: adopt softlockup logic avoiding double-dumps
Patch series "watchdog: Better handling of concurrent lockups".
When we get multiple lockups at roughly the same time, the output i
watchdog/hardlockup: adopt softlockup logic avoiding double-dumps
Patch series "watchdog: Better handling of concurrent lockups".
When we get multiple lockups at roughly the same time, the output in the kernel logs can be very confusing since the reports about the lockups end up interleaved in the logs. There is some code in the kernel to try to handle this but it wasn't that complete.
Li Zhe recently made this a bit better for softlockups (specifically for the case where `kernel.softlockup_all_cpu_backtrace` is not set) in commit 9d02330abd3e ("softlockup: serialized softlockup's log"), but that only handled softlockup reports. Hardlockup reports still had similar issues.
This series also has a small fix to avoid dumping all stacks a second time in the case of a panic. This is a bit unrelated to the interleaving fixes but it does also improve the clarity of lockup reports.
This patch (of 4):
The hardlockup detector and softlockup detector both have the ability to dump the stack of all CPUs (`kernel.hardlockup_all_cpu_backtrace` and `kernel.softlockup_all_cpu_backtrace`). Both detectors also have some logic to attempt to avoid interleaving printouts if two CPUs were trying to do dumps of all CPUs at the same time. However:
- The hardlockup detector's logic still allowed interleaving some information. Specifically another CPU could print modules and dump the stack of the locked CPU at the same time we were dumping all CPUs.
- In the case where `kernel.hardlockup_panic` was set in addition to `kernel.hardlockup_all_cpu_backtrace`, when two CPUs both detected hardlockups at the same time the second CPU could call panic() while the first was still dumping stacks. This was especially bad if the locked up CPU wasn't responding to the request for a backtrace since the function nmi_trigger_cpumask_backtrace() can wait up to 10 seconds.
Let's resolve this by adopting the softlockup logic in the hardlockup handler.
NOTES:
- As part of this, one might think that we should make a helper function that both the hard and softlockup detectors call. This turns out not to be super trivial since it would have to be parameterized quite a bit since there are separate global variables controlling each lockup detector and they print log messages that are just different enough that it would be a pain. We probably don't want to change the messages that are printed without good reason to avoid throwing log parsers for a loop.
- One might also think that it would be a good idea to have the hardlockup and softlockup detector use the same global variable to prevent interleaving. This would make sure that softlockups and hardlockups can't interleave each other. That _almost_ works but has a dangerous flaw if `kernel.hardlockup_panic` is not the same as `kernel.softlockup_panic` because we might skip a call to panic() if one type of lockup was detected at the same time as another.
Link: https://lkml.kernel.org/r/20231220211640.2023645-1-dianders@chromium.org Link: https://lkml.kernel.org/r/20231220131534.1.I4f35a69fbb124b5f0c71f75c631e11fabbe188ff@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
9d02330a |
| 23-Nov-2023 |
Li Zhe <lizhe.67@bytedance.com> |
softlockup: serialized softlockup's log
If multiple CPUs trigger softlockup at the same time with 'softlockup_all_cpu_backtrace=0', the softlockup's logs will appear staggeredly in dmesg, which will
softlockup: serialized softlockup's log
If multiple CPUs trigger softlockup at the same time with 'softlockup_all_cpu_backtrace=0', the softlockup's logs will appear staggeredly in dmesg, which will affect the viewing of the logs for developer. Since the code path for outputting softlockup logs is not a kernel hotspot and the performance requirements for the code are not strict, locks are used to serialize the softlockup log output to improve the readability of the logs.
Link: https://lkml.kernel.org/r/20231123084022.10302-1-lizhe.67@bytedance.com Signed-off-by: Li Zhe <lizhe.67@bytedance.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: John Ogness <john.ogness@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
8b793bcd |
| 27-Oct-2023 |
Krister Johansen <kjlx@templeofstupid.com> |
watchdog: move softlockup_panic back to early_param
Setting softlockup_panic from do_sysctl_args() causes it to take effect later in boot. The lockup detector is enabled before SMP is brought onlin
watchdog: move softlockup_panic back to early_param
Setting softlockup_panic from do_sysctl_args() causes it to take effect later in boot. The lockup detector is enabled before SMP is brought online, but do_sysctl_args runs afterwards. If a user wants to set softlockup_panic on boot and have it trigger should a softlockup occur during onlining of the non-boot processors, they could do this prior to commit f117955a2255 ("kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases"). However, after this commit the value of softlockup_panic is set too late to be of help for this type of problem. Restore the prior behavior.
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Cc: stable@vger.kernel.org Fixes: f117955a2255 ("kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases") Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
show more ...
|
#
1f38c86b |
| 04-Aug-2023 |
Douglas Anderson <dianders@chromium.org> |
watchdog/hardlockup: avoid large stack frames in watchdog_hardlockup_check()
After commit 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") we started storing a
watchdog/hardlockup: avoid large stack frames in watchdog_hardlockup_check()
After commit 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") we started storing a `struct cpumask` on the stack in watchdog_hardlockup_check(). On systems with CONFIG_NR_CPUS set to 8192 this takes up 1K on the stack. That triggers warnings with `CONFIG_FRAME_WARN` set to 1024.
We'll use the new trigger_allbutcpu_cpu_backtrace() to avoid needing to use a CPU mask at all.
Link: https://lkml.kernel.org/r/20230804065935.v4.2.I501ab68cb926ee33a7c87e063d207abf09b9943c@changeid Fixes: 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") Signed-off-by: Douglas Anderson <dianders@chromium.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202307310955.pLZDhpnl-lkp@intel.com Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
8d539b84 |
| 04-Aug-2023 |
Douglas Anderson <dianders@chromium.org> |
nmi_backtrace: allow excluding an arbitrary CPU
The APIs that allow backtracing across CPUs have always had a way to exclude the current CPU. This convenience means callers didn't need to find a pl
nmi_backtrace: allow excluding an arbitrary CPU
The APIs that allow backtracing across CPUs have always had a way to exclude the current CPU. This convenience means callers didn't need to find a place to allocate a CPU mask just to handle the common case.
Let's extend the API to take a CPU ID to exclude instead of just a boolean. This isn't any more complex for the API to handle and allows the hardlockup detector to exclude a different CPU (the one it already did a trace for) without needing to find space for a CPU mask.
Arguably, this new API also encourages safer behavior. Specifically if the caller wants to avoid tracing the current CPU (maybe because they already traced the current CPU) this makes it more obvious to the caller that they need to make sure that the current CPU ID can't change.
[akpm@linux-foundation.org: fix trigger_allbutcpu_cpu_backtrace() stub] Link: https://lkml.kernel.org/r/20230804065935.v4.1.Ia35521b91fc781368945161d7b28538f9996c182@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
47f4cb43 |
| 16-Jun-2023 |
Petr Mladek <pmladek@suse.com> |
watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64
The HAVE_ prefix means that the code could be enabled. Add another variable for HAVE_HARDLOCKUP_DETECTOR_SPARC64 without this prefix. It will be
watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64
The HAVE_ prefix means that the code could be enabled. Add another variable for HAVE_HARDLOCKUP_DETECTOR_SPARC64 without this prefix. It will be set when it should be built. It will make it compatible with the other hardlockup detectors.
Before, it is far from obvious that the SPARC64 variant is actually used:
$> make ARCH=sparc64 defconfig $> grep HARDLOCKUP_DETECTOR .config CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y
After, it is more clear:
$> make ARCH=sparc64 defconfig $> grep HARDLOCKUP_DETECTOR .config CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y CONFIG_HARDLOCKUP_DETECTOR_SPARC64=y
Link: https://lkml.kernel.org/r/20230616150618.6073-6-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
a5fcc236 |
| 16-Jun-2023 |
Petr Mladek <pmladek@suse.com> |
watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific
There are several hardlockup detector implementations and several Kconfig values which allow selection and build of the preferred one.
C
watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific
There are several hardlockup detector implementations and several Kconfig values which allow selection and build of the preferred one.
CONFIG_HARDLOCKUP_DETECTOR was introduced by the commit 23637d477c1f53acb ("lockup_detector: Introduce CONFIG_HARDLOCKUP_DETECTOR") in v2.6.36. It was a preparation step for introducing the new generic perf hardlockup detector.
The existing arch-specific variants did not support the to-be-created generic build configurations, sysctl interface, etc. This distinction was made explicit by the commit 4a7863cc2eb5f98 ("x86, nmi_watchdog: Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR") in v2.6.38.
CONFIG_HAVE_NMI_WATCHDOG was introduced by the commit d314d74c695f967e105 ("nmi watchdog: do not use cpp symbol in Kconfig") in v3.4-rc1. It replaced the above mentioned ARCH_HAS_NMI_WATCHDOG. At that time, it was still used by three architectures, namely blackfin, mn10300, and sparc.
The support for blackfin and mn10300 architectures has been completely dropped some time ago. And sparc is the only architecture with the historic NMI watchdog at the moment.
And the old sparc implementation is really special. It is always built on sparc64. It used to be always enabled until the commit 7a5c8b57cec93196b ("sparc: implement watchdog_nmi_enable and watchdog_nmi_disable") added in v4.10-rc1.
There are only few locations where the sparc64 NMI watchdog interacts with the generic hardlockup detectors code:
+ implements arch_touch_nmi_watchdog() which is called from the generic touch_nmi_watchdog()
+ implements watchdog_hardlockup_enable()/disable() to support /proc/sys/kernel/nmi_watchdog
+ is always preferred over other generic watchdogs, see CONFIG_HARDLOCKUP_DETECTOR
+ includes asm/nmi.h into linux/nmi.h because some sparc-specific functions are needed in sparc-specific code which includes only linux/nmi.h.
The situation became more complicated after the commit 05a4a95279311c3 ("kernel/watchdog: split up config options") and commit 2104180a53698df5 ("powerpc/64s: implement arch-specific hardlockup watchdog") in v4.13-rc1. They introduced HAVE_HARDLOCKUP_DETECTOR_ARCH. It was used for powerpc specific hardlockup detector. It was compatible with the perf one regarding the general boot, sysctl, and programming interfaces.
HAVE_HARDLOCKUP_DETECTOR_ARCH was defined as a superset of HAVE_NMI_WATCHDOG. It made some sense because all arch-specific detectors had some common requirements, namely:
+ implemented arch_touch_nmi_watchdog() + included asm/nmi.h into linux/nmi.h + defined the default value for /proc/sys/kernel/nmi_watchdog
But it actually has made things pretty complicated when the generic buddy hardlockup detector was added. Before the generic perf detector was newer supported together with an arch-specific one. But the buddy detector could work on any SMP system. It means that an architecture could support both the arch-specific and buddy detector.
As a result, there are few tricky dependencies. For example, CONFIG_HARDLOCKUP_DETECTOR depends on:
((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH
The problem is that the very special sparc implementation is defined as:
HAVE_NMI_WATCHDOG && !HAVE_HARDLOCKUP_DETECTOR_ARCH
Another problem is that the meaning of HAVE_NMI_WATCHDOG is far from clear without reading understanding the history.
Make the logic less tricky and more self-explanatory by making HAVE_NMI_WATCHDOG specific for the sparc64 implementation. And rename it to HAVE_HARDLOCKUP_DETECTOR_SPARC64.
Note that HARDLOCKUP_DETECTOR_PREFER_BUDDY, HARDLOCKUP_DETECTOR_PERF, and HARDLOCKUP_DETECTOR_BUDDY may conflict only with HAVE_HARDLOCKUP_DETECTOR_ARCH. They depend on HARDLOCKUP_DETECTOR and it is not longer enabled when HAVE_NMI_WATCHDOG is set.
Link: https://lkml.kernel.org/r/20230616150618.6073-5-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
28168eca |
| 27-May-2023 |
Douglas Anderson <dianders@chromium.org> |
watchdog/hardlockup: move SMP barriers from common code to buddy code
It's been suggested that since the SMP barriers are only potentially useful for the buddy hardlockup detector, not the perf hard
watchdog/hardlockup: move SMP barriers from common code to buddy code
It's been suggested that since the SMP barriers are only potentially useful for the buddy hardlockup detector, not the perf hardlockup detector, that the barriers belong in the buddy code. Let's move them and add clearer comments about why they're needed.
Link: https://lkml.kernel.org/r/20230526184139.9.I5ab0a0eeb0bd52fb23f901d298c72fa5c396e22b@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Suggested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
d3b62ace |
| 27-May-2023 |
Douglas Anderson <dianders@chromium.org> |
watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called
In the patch ("watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs"), we added a call from the common watchd
watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called
In the patch ("watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs"), we added a call from the common watchdog.c file into the buddy. That call could be done more cleanly. Specifically:
1. If we move the call into watchdog_hardlockup_kick() then it keeps watchdog_timer_fn() simpler. 2. We don't need to pass an "unsigned long" to the buddy for the timer count. In the patch ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") the count was changed to "atomic_t" which is backed by an int, so we should match types.
Link: https://lkml.kernel.org/r/20230526184139.6.I006c7d958a1ea5c4e1e4dc44a25596d9bb5fd3ba@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Suggested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
7a71d8e6 |
| 27-May-2023 |
Douglas Anderson <dianders@chromium.org> |
watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy()
In the patch ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") we started using a cpumask to keep track
watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy()
In the patch ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") we started using a cpumask to keep track of which CPUs to backtrace. When setting up this cpumask, it's better to use cpumask_copy() than to just copy the structure directly. Fix this.
Link: https://lkml.kernel.org/r/20230526184139.4.Iccee2d1ea19114dafb6553a854ea4d8ab2a3f25b@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Suggested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
2711e4ad |
| 27-May-2023 |
Douglas Anderson <dianders@chromium.org> |
watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick()
In the patch ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") there was no reason to use raw_cpu_p
watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick()
In the patch ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") there was no reason to use raw_cpu_ptr(). Using this_cpu_ptr() works fine.
Link: https://lkml.kernel.org/r/20230526184139.3.I660e103077dcc23bb29aaf2be09cb234e0495b2d@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Suggested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
6426e8d1 |
| 27-May-2023 |
Douglas Anderson <dianders@chromium.org> |
watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe()
Right now there is one arch (sparc64) that selects HAVE_NMI_WATCHDOG without selecting HAVE_HARDLOCKUP_DETECTOR_ARCH
watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe()
Right now there is one arch (sparc64) that selects HAVE_NMI_WATCHDOG without selecting HAVE_HARDLOCKUP_DETECTOR_ARCH. Because of that one architecture, we have some special case code in the watchdog core to handle the fact that watchdog_hardlockup_probe() isn't implemented.
Let's implement watchdog_hardlockup_probe() for sparc64 and get rid of the special case.
As a side effect of doing this, code inspection tells us that we could fix a minor bug where the system won't properly realize that NMI watchdogs are disabled. Specifically, on powerpc if CONFIG_PPC_WATCHDOG is turned off the arch might still select CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH which selects CONFIG_HAVE_NMI_WATCHDOG. Since CONFIG_PPC_WATCHDOG was off then nothing will override the "weak" watchdog_hardlockup_probe() and we'll fallback to looking at CONFIG_HAVE_NMI_WATCHDOG.
Link: https://lkml.kernel.org/r/20230526184139.2.Ic6ebbf307ca0efe91f08ce2c1eb4a037ba6b0700@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Suggested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
9ec272c5 |
| 27-May-2023 |
Douglas Anderson <dianders@chromium.org> |
watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails
Patch series "watchdog: Cleanup / fixes after buddy series v5 reviews".
This patch series attempts to finish resolving th
watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails
Patch series "watchdog: Cleanup / fixes after buddy series v5 reviews".
This patch series attempts to finish resolving the feedback received from Petr Mladek on the v5 series I posted.
Probably the only thing that wasn't fully as clean as Petr requested was the Kconfig stuff. I couldn't find a better way to express it without a more major overhaul. In the very least, I renamed "NON_ARCH" to "PERF_OR_BUDDY" in the hopes that will make it marginally better.
Nothing in this series is terribly critical and even the bugfixes are small. However, it does cleanup a few things that were pointed out in review.
This patch (of 10):
The permissions for the kernel.nmi_watchdog sysctl have always been set at compile time despite the fact that a watchdog can fail to probe. Let's fix this and set the permissions based on whether the hardlockup detector actually probed.
Link: https://lkml.kernel.org/r/20230527014153.2793931-1-dianders@chromium.org Link: https://lkml.kernel.org/r/20230526184139.1.I0d75971cc52a7283f495aac0bd5c3041aadc734e@changeid Fixes: a994a3147e4c ("watchdog/hardlockup/perf: Implement init time detection of perf") Signed-off-by: Douglas Anderson <dianders@chromium.org> Reported-by: Petr Mladek <pmladek@suse.com> Closes: https://lore.kernel.org/r/ZHCn4hNxFpY5-9Ki@alley Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
930d8f8d |
| 19-May-2023 |
Lecopzer Chen <lecopzer.chen@mediatek.com> |
watchdog/perf: adapt the watchdog_perf interface for async model
When lockup_detector_init()->watchdog_hardlockup_probe(), PMU may be not ready yet. E.g. on arm64, PMU is not ready until device_in
watchdog/perf: adapt the watchdog_perf interface for async model
When lockup_detector_init()->watchdog_hardlockup_probe(), PMU may be not ready yet. E.g. on arm64, PMU is not ready until device_initcall(armv8_pmu_driver_init). And it is deeply integrated with the driver model and cpuhp. Hence it is hard to push this initialization before smp_init().
But it is easy to take an opposite approach and try to initialize the watchdog once again later. The delayed probe is called using workqueues. It need to allocate memory and must be proceed in a normal context. The delayed probe is able to use if watchdog_hardlockup_probe() returns non-zero which means the return code returned when PMU is not ready yet.
Provide an API - lockup_detector_retry_init() for anyone who needs to delayed init lockup detector if they had ever failed at lockup_detector_init().
The original assumption is: nobody should use delayed probe after lockup_detector_check() which has __init attribute. That is, anyone uses this API must call between lockup_detector_init() and lockup_detector_check(), and the caller must have __init attribute
Link: https://lkml.kernel.org/r/20230519101840.v5.16.If4ad5dd5d09fb1309cebf8bcead4b6a5a7758ca7@changeid Reviewed-by: Petr Mladek <pmladek@suse.com> Co-developed-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Suggested-by: Petr Mladek <pmladek@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Sumit Garg <sumit.garg@linaro.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
1f423c90 |
| 19-May-2023 |
Douglas Anderson <dianders@chromium.org> |
watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs
Implement a hardlockup detector that doesn't doesn't need any extra arch-specific support code to detect lockups. Instead of us
watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs
Implement a hardlockup detector that doesn't doesn't need any extra arch-specific support code to detect lockups. Instead of using something arch-specific we will use the buddy system, where each CPU watches out for another one. Specifically, each CPU will use its softlockup hrtimer to check that the next CPU is processing hrtimer interrupts by verifying that a counter is increasing.
NOTE: unlike the other hard lockup detectors, the buddy one can't easily show what's happening on the CPU that locked up just by doing a simple backtrace. It relies on some other mechanism in the system to get information about the locked up CPUs. This could be support for NMI backtraces like [1], it could be a mechanism for printing the PC of locked CPUs at panic time like [2] / [3], or it could be something else. Even though that means we still rely on arch-specific code, this arch-specific code seems to often be implemented even on architectures that don't have a hardlockup detector.
This style of hardlockup detector originated in some downstream Android trees and has been rebased on / carried in ChromeOS trees for quite a long time for use on arm and arm64 boards. Historically on these boards we've leveraged mechanism [2] / [3] to get information about hung CPUs, but we could move to [1].
Although the original motivation for the buddy system was for use on systems without an arch-specific hardlockup detector, it can still be useful to use even on systems that _do_ have an arch-specific hardlockup detector. On x86, for instance, there is a 24-part patch series [4] in progress switching the arch-specific hard lockup detector from a scarce perf counter to a less-scarce hardware resource. Potentially the buddy system could be a simpler alternative to free up the perf counter but still get hard lockup detection.
Overall, pros (+) and cons (-) of the buddy system compared to an arch-specific hardlockup detector (which might be implemented using perf): + The buddy system is usable on systems that don't have an arch-specific hardlockup detector, like arm32 and arm64 (though it's being worked on for arm64 [5]). + The buddy system may free up scarce hardware resources. + If a CPU totally goes out to lunch (can't process NMIs) the buddy system could still detect the problem (though it would be unlikely to be able to get a stack trace). + The buddy system uses the same timer function to pet the hardlockup detector on the running CPU as it uses to detect hardlockups on other CPUs. Compared to other hardlockup detectors, this means it generates fewer interrupts and thus is likely better able to let CPUs stay idle longer. - If all CPUs are hard locked up at the same time the buddy system can't detect it. - If we don't have SMP we can't use the buddy system. - The buddy system needs an arch-specific mechanism (possibly NMI backtrace) to get info about the locked up CPU.
[1] https://lore.kernel.org/r/20230419225604.21204-1-dianders@chromium.org [2] https://issuetracker.google.com/172213129 [3] https://docs.kernel.org/trace/coresight/coresight-cpu-debug.html [4] https://lore.kernel.org/lkml/20230301234753.28582-1-ricardo.neri-calderon@linux.intel.com/ [5] https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.chen@mediatek.com/
Link: https://lkml.kernel.org/r/20230519101840.v5.14.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Tzung-Bi Shih <tzungbi@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ian Rogers <irogers@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Sumit Garg <sumit.garg@linaro.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
d9b3629a |
| 19-May-2023 |
Douglas Anderson <dianders@chromium.org> |
watchdog/hardlockup: have the perf hardlockup use __weak functions more cleanly
The fact that there watchdog_hardlockup_enable(), watchdog_hardlockup_disable(), and watchdog_hardlockup_probe() are d
watchdog/hardlockup: have the perf hardlockup use __weak functions more cleanly
The fact that there watchdog_hardlockup_enable(), watchdog_hardlockup_disable(), and watchdog_hardlockup_probe() are declared __weak means that the configured hardlockup detector can define non-weak versions of those functions if it needs to. Instead of doing this, the perf hardlockup detector hooked itself into the default __weak implementation, which was a bit awkward. Clean this up.
From comments, it looks as if the original design was done because the __weak function were expected to implemented by the architecture and not by the configured hardlockup detector. This got awkward when we tried to add the buddy lockup detector which was not arch-specific but wanted to hook into those same functions.
This is not expected to have any functional impact.
Link: https://lkml.kernel.org/r/20230519101840.v5.13.I847d9ec852449350997ba00401d2462a9cb4302b@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Sumit Garg <sumit.garg@linaro.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
df95d308 |
| 19-May-2023 |
Douglas Anderson <dianders@chromium.org> |
watchdog/hardlockup: rename some "NMI watchdog" constants/function
Do a search and replace of: - NMI_WATCHDOG_ENABLED => WATCHDOG_HARDLOCKUP_ENABLED - SOFT_WATCHDOG_ENABLED => WATCHDOG_SOFTOCKUP_ENA
watchdog/hardlockup: rename some "NMI watchdog" constants/function
Do a search and replace of: - NMI_WATCHDOG_ENABLED => WATCHDOG_HARDLOCKUP_ENABLED - SOFT_WATCHDOG_ENABLED => WATCHDOG_SOFTOCKUP_ENABLED - watchdog_nmi_ => watchdog_hardlockup_ - nmi_watchdog_available => watchdog_hardlockup_available - nmi_watchdog_user_enabled => watchdog_hardlockup_user_enabled - soft_watchdog_user_enabled => watchdog_softlockup_user_enabled - NMI_WATCHDOG_DEFAULT => WATCHDOG_HARDLOCKUP_DEFAULT
Then update a few comments near where names were changed.
This is specifically to make it less confusing when we want to introduce the buddy hardlockup detector, which isn't using NMIs. As part of this, we sanitized a few names for consistency.
[trix@redhat.com: make variables static] Link: https://lkml.kernel.org/r/20230525162822.1.I0fb41d138d158c9230573eaa37dc56afa2fb14ee@changeid Link: https://lkml.kernel.org/r/20230519101840.v5.12.I91f7277bab4bf8c0cb238732ed92e7ce7bbd71a6@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Tom Rix <trix@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Sumit Garg <sumit.garg@linaro.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
ed92e1ef |
| 19-May-2023 |
Douglas Anderson <dianders@chromium.org> |
watchdog/hardlockup: move perf hardlockup watchdog petting to watchdog.c
In preparation for the buddy hardlockup detector, which wants the same petting logic as the current perf hardlockup detector,
watchdog/hardlockup: move perf hardlockup watchdog petting to watchdog.c
In preparation for the buddy hardlockup detector, which wants the same petting logic as the current perf hardlockup detector, move the code to watchdog.c. While doing this, rename the global variable to match others nearby. As part of this change we have to change the code to account for the fact that the CPU we're running on might be different than the one we're checking.
Currently the code in watchdog.c is guarded by CONFIG_HARDLOCKUP_DETECTOR_PERF, which makes this change seem silly. However, a future patch will change this.
Link: https://lkml.kernel.org/r/20230519101840.v5.11.I00dfd6386ee00da25bf26d140559a41339b53e57@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Sumit Garg <sumit.garg@linaro.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
77c12fc9 |
| 19-May-2023 |
Douglas Anderson <dianders@chromium.org> |
watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()
In preparation for the buddy hardlockup detector where the CPU checking for lockup might not be the currently running CPU, add a
watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()
In preparation for the buddy hardlockup detector where the CPU checking for lockup might not be the currently running CPU, add a "cpu" parameter to watchdog_hardlockup_check().
As part of this change, make hrtimer_interrupts an atomic_t since now the CPU incrementing the value and the CPU reading the value might be different. Technially this could also be done with just READ_ONCE and WRITE_ONCE, but atomic_t feels a little cleaner in this case.
While hrtimer_interrupts is made atomic_t, we change hrtimer_interrupts_saved from "unsigned long" to "int". The "int" is needed to match the data type backing atomic_t for hrtimer_interrupts. Even if this changes us from 64-bits to 32-bits (which I don't think is true for most compilers), it doesn't really matter. All we ever do is increment it every few seconds and compare it to an old value so 32-bits is fine (even 16-bits would be). The "signed" vs "unsigned" also doesn't matter for simple equality comparisons.
hrtimer_interrupts_saved is _not_ switched to atomic_t nor even accessed with READ_ONCE / WRITE_ONCE. The hrtimer_interrupts_saved is always consistently accessed with the same CPU. NOTE: with the upcoming "buddy" detector there is one special case. When a CPU goes offline/online then we can change which CPU is the one to consistently access a given instance of hrtimer_interrupts_saved. We still can't end up with a partially updated hrtimer_interrupts_saved, however, because we end up petting all affected CPUs to make sure the new and old CPU can't end up somehow read/write hrtimer_interrupts_saved at the same time.
Link: https://lkml.kernel.org/r/20230519101840.v5.10.I3a7d4dd8c23ac30ee0b607d77feb6646b64825c0@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Sumit Garg <sumit.garg@linaro.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
1610611a |
| 19-May-2023 |
Douglas Anderson <dianders@chromium.org> |
watchdog/hardlockup: style changes to watchdog_hardlockup_check() / is_hardlockup()
These are tiny style changes: - Add a blank line before a "return". - Renames two globals to use the "watchdog_har
watchdog/hardlockup: style changes to watchdog_hardlockup_check() / is_hardlockup()
These are tiny style changes: - Add a blank line before a "return". - Renames two globals to use the "watchdog_hardlockup" prefix. - Store processor id in "unsigned int" rather than "int". - Minor comment rewording. - Use "else" rather than extra returns since it seemed more symmetric.
Link: https://lkml.kernel.org/r/20230519101840.v5.9.I818492c326b632560b09f20d2608455ecf9d3650@changeid Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Sumit Garg <sumit.garg@linaro.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
81972551 |
| 19-May-2023 |
Douglas Anderson <dianders@chromium.org> |
watchdog/hardlockup: move perf hardlockup checking/panic to common watchdog.c
The perf hardlockup detector works by looking at interrupt counts and seeing if they change from run to run. The interr
watchdog/hardlockup: move perf hardlockup checking/panic to common watchdog.c
The perf hardlockup detector works by looking at interrupt counts and seeing if they change from run to run. The interrupt counts are managed by the common watchdog code via its watchdog_timer_fn().
Currently the API between the perf detector and the common code is a function: is_hardlockup(). When the hard lockup detector sees that function return true then it handles printing out debug info and inducing a panic if necessary.
Let's change the API a little bit in preparation for the buddy hardlockup detector. The buddy hardlockup detector wants to print nearly the same debug info and have nearly the same panic behavior. That means we want to move all that code to the common file. For now, the code in the common file will only be there if the perf hardlockup detector is enabled, but eventually it will be selected by a common config.
Right now, this _just_ moves the code from the perf detector file to the common file and changes the names. It doesn't make the changes that the buddy hardlockup detector will need and doesn't do any style cleanups. A future patch will do cleanup to make it more obvious what changed.
With the above, we no longer have any callers of is_hardlockup() outside of the "watchdog.c" file, so we can remove it from the header, make it static, and move it to the same "#ifdef" block as our new watchdog_hardlockup_check(). While doing this, it can be noted that even if no hardlockup detectors were configured the existing code used to still have the code for counting/checking "hrtimer_interrupts" even if the perf hardlockup detector wasn't configured. We didn't need to do that, so move all the "hrtimer_interrupts" counting to only be there if the perf hardlockup detector is configured as well.
This change is expected to be a no-op.
Link: https://lkml.kernel.org/r/20230519101840.v5.8.Id4133d3183e798122dc3b6205e7852601f289071@changeid Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Sumit Garg <sumit.garg@linaro.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|