#
a6c11c0a |
| 22-May-2024 |
Dongli Zhang <dongli.zhang@oracle.com> |
genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline
The absence of IRQD_MOVE_PCNTXT prevents immediate effectiveness of interrupt affinity reconfiguration via procfs. Instead, the
genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline
The absence of IRQD_MOVE_PCNTXT prevents immediate effectiveness of interrupt affinity reconfiguration via procfs. Instead, the change is deferred until the next instance of the interrupt being triggered on the original CPU.
When the interrupt next triggers on the original CPU, the new affinity is enforced within __irq_move_irq(). A vector is allocated from the new CPU, but the old vector on the original CPU remains and is not immediately reclaimed. Instead, apicd->move_in_progress is flagged, and the reclaiming process is delayed until the next trigger of the interrupt on the new CPU.
Upon the subsequent triggering of the interrupt on the new CPU, irq_complete_move() adds a task to the old CPU's vector_cleanup list if it remains online. Subsequently, the timer on the old CPU iterates over its vector_cleanup list, reclaiming old vectors.
However, a rare scenario arises if the old CPU is outgoing before the interrupt triggers again on the new CPU.
In that case irq_force_complete_move() is not invoked on the outgoing CPU to reclaim the old apicd->prev_vector because the interrupt isn't currently affine to the outgoing CPU, and irq_needs_fixup() returns false. Even though __vector_schedule_cleanup() is later called on the new CPU, it doesn't reclaim apicd->prev_vector; instead, it simply resets both apicd->move_in_progress and apicd->prev_vector to 0.
As a result, the vector remains unreclaimed in vector_matrix, leading to a CPU vector leak.
To address this issue, move the invocation of irq_force_complete_move() before the irq_needs_fixup() call to reclaim apicd->prev_vector, if the interrupt is currently or used to be affine to the outgoing CPU.
Additionally, reclaim the vector in __vector_schedule_cleanup() as well, following a warning message, although theoretically it should never see apicd->move_in_progress with apicd->prev_cpu pointing to an offline CPU.
Fixes: f0383c24b485 ("genirq/cpuhotplug: Add support for cleaning up move in progress") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240522220218.162423-1-dongli.zhang@oracle.com
show more ...
|
#
fef05a07 |
| 23-Apr-2024 |
Jacob Pan <jacob.jun.pan@linux.intel.com> |
x86/irq: Factor out common code for checking pending interrupts
Use a common function for checking pending interrupt vector in APIC IRR instead of duplicated open coding them.
Additional checks for
x86/irq: Factor out common code for checking pending interrupts
Use a common function for checking pending interrupt vector in APIC IRR instead of duplicated open coding them.
Additional checks for posted MSI vectors can then be contained in this function.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240423174114.526704-10-jacob.jun.pan@linux.intel.com
show more ...
|
#
54aa699e |
| 03-Jan-2024 |
Bjorn Helgaas <bhelgaas@google.com> |
arch/x86: Fix typos
Fix typos, most reported by "codespell arch/x86". Only touches comments, no code changes.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ingo Molnar <mingo@k
arch/x86: Fix typos
Fix typos, most reported by "codespell arch/x86". Only touches comments, no code changes.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240103004011.1758650-1-helgaas@kernel.org
show more ...
|
#
28b82352 |
| 09-Aug-2023 |
Dave Hansen <dave.hansen@linux.intel.com> |
x86/apic: Wrap IPI calls into helper functions
Move them to one place so the static call conversion gets simpler.
No functional change.
[ dhansen: merge against recent x86/apic changes ]
Signed-o
x86/apic: Wrap IPI calls into helper functions
Move them to one place so the static call conversion gets simpler.
No functional change.
[ dhansen: merge against recent x86/apic changes ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
show more ...
|
#
670c04ad |
| 09-Aug-2023 |
Dave Hansen <dave.hansen@linux.intel.com> |
x86/apic: Nuke ack_APIC_irq()
Yet another wrapper of a wrapper gone along with the outdated comment that this compiles to a single instruction.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> S
x86/apic: Nuke ack_APIC_irq()
Yet another wrapper of a wrapper gone along with the outdated comment that this compiles to a single instruction.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Wei Liu <wei.liu@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
show more ...
|
#
9132d720 |
| 08-Aug-2023 |
Thomas Gleixner <tglx@linutronix.de> |
x86/apic: Wrap APIC ID validation into an inline
Prepare for removing the callback and making this as simple comparison to an upper limit, which is the obvious solution to do for limit checks...
Si
x86/apic: Wrap APIC ID validation into an inline
Prepare for removing the callback and making this as simple comparison to an upper limit, which is the obvious solution to do for limit checks...
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
show more ...
|
#
a6625b47 |
| 08-Aug-2023 |
Thomas Gleixner <tglx@linutronix.de> |
x86/apic: Get rid of hard_smp_processor_id()
No point in having a wrapper around read_apic_id().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.in
x86/apic: Get rid of hard_smp_processor_id()
No point in having a wrapper around read_apic_id().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
show more ...
|
#
49062454 |
| 08-Aug-2023 |
Thomas Gleixner <tglx@linutronix.de> |
x86/apic: Rename disable_apic
It reflects a state and not a command. Make it bool while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.inte
x86/apic: Rename disable_apic
It reflects a state and not a command. Make it bool while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
show more ...
|
#
bdc1dad2 |
| 21-Jun-2023 |
Thomas Gleixner <tglx@linutronix.de> |
x86/vector: Replace IRQ_MOVE_CLEANUP_VECTOR with a timer callback
The left overs of a moved interrupt are cleaned up once the interrupt is raised on the new target CPU. Keeping the vector valid on t
x86/vector: Replace IRQ_MOVE_CLEANUP_VECTOR with a timer callback
The left overs of a moved interrupt are cleaned up once the interrupt is raised on the new target CPU. Keeping the vector valid on the original target CPU guarantees that there can't be an interrupt lost if the affinity change races with an concurrent interrupt from the device.
This cleanup utilizes the lowest priority interrupt vector for this cleanup, which makes sure that in the unlikely case when the to be cleaned up interrupt is pending in the local APICs IRR the cleanup vector does not live lock.
But there is no real reason to use an interrupt vector for cleaning up the leftovers of a moved interrupt. It's not a high performance operation. The only requirement is that it happens on the original target CPU.
Convert it to use a timer instead and adjust the code accordingly.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230621171248.6805-3-xin3.li@intel.com
show more ...
|
#
a539cc86 |
| 21-Jun-2023 |
Thomas Gleixner <tglx@linutronix.de> |
x86/vector: Rename send_cleanup_vector() to vector_schedule_cleanup()
Rename send_cleanup_vector() to vector_schedule_cleanup() to prepare for replacing the vector cleanup IPI with a timer callback.
x86/vector: Rename send_cleanup_vector() to vector_schedule_cleanup()
Rename send_cleanup_vector() to vector_schedule_cleanup() to prepare for replacing the vector cleanup IPI with a timer callback.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20230621171248.6805-2-xin3.li@intel.com
show more ...
|
#
d474d92d |
| 11-Nov-2022 |
Thomas Gleixner <tglx@linutronix.de> |
x86/apic: Remove X86_IRQ_ALLOC_CONTIGUOUS_VECTORS
Now that the PCI/MSI core code does early checking for multi-MSI support X86_IRQ_ALLOC_CONTIGUOUS_VECTORS is not required anymore.
Remove the flag
x86/apic: Remove X86_IRQ_ALLOC_CONTIGUOUS_VECTORS
Now that the PCI/MSI core code does early checking for multi-MSI support X86_IRQ_ALLOC_CONTIGUOUS_VECTORS is not required anymore.
Remove the flag and rely on MSI_FLAG_MULTI_PCI_MSI.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20221111122015.865042356@linutronix.de
show more ...
|
#
749443de |
| 14-Aug-2021 |
Yury Norov <yury.norov@gmail.com> |
Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
A couple of kernel functions call for_each_*_bit_from() with start bit equal to 0. Replace them with for_each_*_bit().
No funct
Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
A couple of kernel functions call for_each_*_bit_from() with start bit equal to 0. Replace them with for_each_*_bit().
No functional changes, but might improve on readability.
Signed-off-by: Yury Norov <yury.norov@gmail.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
show more ...
|
#
d2531661 |
| 20-Jul-2021 |
Maciej W. Rozycki <macro@orcam.me.uk> |
x86: Avoid magic number with ELCR register accesses
Define PIC_ELCR1 and PIC_ELCR2 macros for accesses to the ELCR registers implemented by many chipsets in their embedded 8259A PIC cores, avoiding
x86: Avoid magic number with ELCR register accesses
Define PIC_ELCR1 and PIC_ELCR2 macros for accesses to the ELCR registers implemented by many chipsets in their embedded 8259A PIC cores, avoiding magic numbers that are difficult to handle, and complementing the macros we already have for registers originally defined with discrete 8259A PIC implementations. No functional change.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/alpine.DEB.2.21.2107200237300.9461@angie.orcam.me.uk
show more ...
|
#
7d65f9e8 |
| 25-May-2021 |
Thomas Gleixner <tglx@linutronix.de> |
x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
PIC interrupts do not support affinity setting and they can end up on any online CPU. Therefore, it's required to mark the associated v
x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
PIC interrupts do not support affinity setting and they can end up on any online CPU. Therefore, it's required to mark the associated vectors as system-wide reserved. Otherwise, the corresponding irq descriptors are copied to the secondary CPUs but the vectors are not marked as assigned or reserved. This works correctly for the IO/APIC case.
When the IO/APIC is disabled via config, kernel command line or lack of enumeration then all legacy interrupts are routed through the PIC, but nothing marks them as system-wide reserved vectors.
As a consequence, a subsequent allocation on a secondary CPU can result in allocating one of these vectors, which triggers the BUG() in apic_update_vector() because the interrupt descriptor slot is not empty.
Imran tried to work around that by marking those interrupts as allocated when a CPU comes online. But that's wrong in case that the IO/APIC is available and one of the legacy interrupts, e.g. IRQ0, has been switched to PIC mode because then marking them as allocated will fail as they are already marked as system vectors.
Stay consistent and update the legacy vectors after attempting IO/APIC initialization and mark them as system vectors in case that no IO/APIC is available.
Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment") Reported-by: Imran Khan <imran.f.khan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210519233928.2157496-1-imran.f.khan@oracle.com
show more ...
|
#
9a98bc2c |
| 18-Mar-2021 |
Thomas Gleixner <tglx@linutronix.de> |
x86/vector: Add a sanity check to prevent IRQ2 allocations
To prevent another incidental removal of the IRQ2 ignore logic in the IO/APIC code going unnoticed add a sanity check. Add some commentry a
x86/vector: Add a sanity check to prevent IRQ2 allocations
To prevent another incidental removal of the IRQ2 ignore logic in the IO/APIC code going unnoticed add a sanity check. Add some commentry at the other place which ignores IRQ2 while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210318192819.795280387@linutronix.de
show more ...
|
#
d9f6e12f |
| 18-Mar-2021 |
Ingo Molnar <mingo@kernel.org> |
x86: Fix various typos in comments
Fix ~144 single-word typos in arch/x86/ code comments.
Doing this in a single commit should reduce the churn.
Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: B
x86: Fix various typos in comments
Fix ~144 single-word typos in arch/x86/ code comments.
Doing this in a single commit should reduce the churn.
Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
show more ...
|
#
190113b4 |
| 10-Dec-2020 |
Thomas Gleixner <tglx@linutronix.de> |
x86/apic/vector: Fix ordering in vector assignment
Prarit reported that depending on the affinity setting the
' irq $N: Affinity broken due to vector space exhaustion.'
message is showing up in d
x86/apic/vector: Fix ordering in vector assignment
Prarit reported that depending on the affinity setting the
' irq $N: Affinity broken due to vector space exhaustion.'
message is showing up in dmesg, but the vector space on the CPUs in the affinity mask is definitely not exhausted.
Shung-Hsi provided traces and analysis which pinpoints the problem:
The ordering of trying to assign an interrupt vector in assign_irq_vector_any_locked() is simply wrong if the interrupt data has a valid node assigned. It does:
1) Try the intersection of affinity mask and node mask 2) Try the node mask 3) Try the full affinity mask 4) Try the full online mask
Obviously #2 and #3 are in the wrong order as the requested affinity mask has to take precedence.
In the observed cases #1 failed because the affinity mask did not contain CPUs from node 0. That made it allocate a vector from node 0, thereby breaking affinity and emitting the misleading message.
Revert the order of #2 and #3 so the full affinity mask without the node intersection is tried before actually affinity is broken.
If no node is assigned then only the full affinity mask and if that fails the full online mask is tried.
Fixes: d6ffc6ac83b1 ("x86/vector: Respect affinity mask in irq descriptor") Reported-by: Prarit Bhargava <prarit@redhat.com> Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87ft4djtyp.fsf@nanos.tec.linutronix.de
show more ...
|
#
6452ea2a |
| 24-Oct-2020 |
David Woodhouse <dwmw@amazon.co.uk> |
x86/apic: Add select() method on vector irqdomain
This will be used to select the irqdomain for I/O-APIC and HPET.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner
x86/apic: Add select() method on vector irqdomain
This will be used to select the irqdomain for I/O-APIC and HPET.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201024213535.443185-24-dwmw2@infradead.org
show more ...
|
#
f598181a |
| 24-Oct-2020 |
David Woodhouse <dwmw@amazon.co.uk> |
x86/apic: Always provide irq_compose_msi_msg() method for vector domain
This shouldn't be dependent on PCI_MSI. HPET and I/O-APIC can deliver interrupts through MSI without having any PCI in the sys
x86/apic: Always provide irq_compose_msi_msg() method for vector domain
This shouldn't be dependent on PCI_MSI. HPET and I/O-APIC can deliver interrupts through MSI without having any PCI in the system at all.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201024213535.443185-10-dwmw2@infradead.org
show more ...
|
#
6b15ffa0 |
| 26-Aug-2020 |
Thomas Gleixner <tglx@linutronix.de> |
x86/irq: Initialize PCI/MSI domain at PCI init time
No point in initializing the default PCI/MSI interrupt domain early and no point to create it when XEN PV/HVM/DOM0 are active.
Move the initializ
x86/irq: Initialize PCI/MSI domain at PCI init time
No point in initializing the default PCI/MSI interrupt domain early and no point to create it when XEN PV/HVM/DOM0 are active.
Move the initialization to pci_arch_init() and convert it to init ops so that XEN can override it as XEN has it's own PCI/MSI management. The XEN override comes in a later step.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112332.859209894@linutronix.de
show more ...
|
#
b0a19555 |
| 26-Aug-2020 |
Thomas Gleixner <tglx@linutronix.de> |
x86/msi: Move compose message callback where it belongs
Composing the MSI message at the MSI chip level is wrong because the underlying parent domain is the one which knows how the message should be
x86/msi: Move compose message callback where it belongs
Composing the MSI message at the MSI chip level is wrong because the underlying parent domain is the one which knows how the message should be composed for the direct vector delivery or the interrupt remapping table entry.
The interrupt remapping aware PCI/MSI chip does that already. Make the direct delivery chip do the same and move the composition of the direct delivery MSI message to the vector domain irq chip.
This prepares for the upcoming device MSI support to avoid having architecture specific knowledge in the device MSI domain irq chips.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112331.157603198@linutronix.de
show more ...
|
#
e027ffff |
| 26-Aug-2020 |
Thomas Gleixner <tglx@linutronix.de> |
x86/irq: Unbreak interrupt affinity setting
Several people reported that 5.8 broke the interrupt affinity setting mechanism.
The consolidation of the entry code reused the regular exception entry c
x86/irq: Unbreak interrupt affinity setting
Several people reported that 5.8 broke the interrupt affinity setting mechanism.
The consolidation of the entry code reused the regular exception entry code for device interrupts and changed the way how the vector number is conveyed from ptregs->orig_ax to a function argument.
The low level entry uses the hardware error code slot to push the vector number onto the stack which is retrieved from there into a function argument and the slot on stack is set to -1.
The reason for setting it to -1 is that the error code slot is at the position where pt_regs::orig_ax is. A positive value in pt_regs::orig_ax indicates that the entry came via a syscall. If it's not set to a negative value then a signal delivery on return to userspace would try to restart a syscall. But there are other places which rely on pt_regs::orig_ax being a valid indicator for syscall entry.
But setting pt_regs::orig_ax to -1 has a nasty side effect vs. the interrupt affinity setting mechanism, which was overlooked when this change was made.
Moving interrupts on x86 happens in several steps. A new vector on a different CPU is allocated and the relevant interrupt source is reprogrammed to that. But that's racy and there might be an interrupt already in flight to the old vector. So the old vector is preserved until the first interrupt arrives on the new vector and the new target CPU. Once that happens the old vector is cleaned up, but this cleanup still depends on the vector number being stored in pt_regs::orig_ax, which is now -1.
That -1 makes the check for cleanup: pt_regs::orig_ax == new_vector always false. As a consequence the interrupt is moved once, but then it cannot be moved anymore because the cleanup of the old vector never happens.
There would be several ways to convey the vector information to that place in the guts of the interrupt handling, but on deeper inspection it turned out that this check is pointless and a leftover from the old affinity model of X86 which supported multi-CPU affinities. Under this model it was possible that an interrupt had an old and a new vector on the same CPU, so the vector match was required.
Under the new model the effective affinity of an interrupt is always a single CPU from the requested affinity mask. If the affinity mask changes then either the interrupt stays on the CPU and on the same vector when that CPU is still in the new affinity mask or it is moved to a different CPU, but it is never moved to a different vector on the same CPU.
Ergo the cleanup check for the matching vector number is not required and can be removed which makes the dependency on pt_regs:orig_ax go away.
The remaining check for new_cpu == smp_processsor_id() is completely sufficient. If it matches then the interrupt was successfully migrated and the cleanup can proceed.
For paranoia sake add a warning into the vector assignment code to validate that the assumption of never moving to a different vector on the same CPU holds.
Fixes: 633260fa143 ("x86/irq: Convey vector as argument and not in ptregs") Reported-by: Alex bykov <alex.bykov@scylladb.com> Reported-by: Avi Kivity <avi@scylladb.com> Reported-by: Alexander Graf <graf@amazon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Alexander Graf <graf@amazon.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87wo1ltaxz.fsf@nanos.tec.linutronix.de
show more ...
|
#
f0c7baca |
| 24-Jul-2020 |
Thomas Gleixner <tglx@linutronix.de> |
genirq/affinity: Make affinity setting if activated opt-in
John reported that on a RK3288 system the perf per CPU interrupts are all affine to CPU0 and provided the analysis:
"It looks like what h
genirq/affinity: Make affinity setting if activated opt-in
John reported that on a RK3288 system the perf per CPU interrupts are all affine to CPU0 and provided the analysis:
"It looks like what happens is that because the interrupts are not per-CPU in the hardware, armpmu_request_irq() calls irq_force_affinity() while the interrupt is deactivated and then request_irq() with IRQF_PERCPU | IRQF_NOBALANCING.
Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls irq_setup_affinity() which returns early because IRQF_PERCPU and IRQF_NOBALANCING are set, leaving the interrupt on its original CPU."
This was broken by the recent commit which blocked interrupt affinity setting in hardware before activation of the interrupt. While this works in general, it does not work for this particular case. As contrary to the initial analysis not all interrupt chip drivers implement an activate callback, the safe cure is to make the deferred interrupt affinity setting at activation time opt-in.
Implement the necessary core logic and make the two irqchip implementations for which this is required opt-in. In hindsight this would have been the right thing to do, but ...
Fixes: baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly") Reported-by: John Keeping <john@metanate.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de
show more ...
|
#
baedb87d |
| 17-Jul-2020 |
Thomas Gleixner <tglx@linutronix.de> |
genirq/affinity: Handle affinity setting on inactive interrupts correctly
Setting interrupt affinity on inactive interrupts is inconsistent when hierarchical irq domains are enabled. The core code s
genirq/affinity: Handle affinity setting on inactive interrupts correctly
Setting interrupt affinity on inactive interrupts is inconsistent when hierarchical irq domains are enabled. The core code should just store the affinity and not call into the irq chip driver for inactive interrupts because the chip drivers may not be in a state to handle such requests.
X86 has a hacky workaround for that but all other irq chips have not which causes problems e.g. on GIC V3 ITS.
Instead of adding more ugly hacks all over the place, solve the problem in the core code. If the affinity is set on an inactive interrupt then:
- Store it in the irq descriptors affinity mask - Update the effective affinity to reflect that so user space has a consistent view - Don't call into the irq chip driver
This is the core equivalent of the X86 workaround and works correctly because the affinity setting is established in the irq chip when the interrupt is activated later on.
Note, that this is only effective when hierarchical irq domains are enabled by the architecture. Doing it unconditionally would break legacy irq chip implementations.
For hierarchial irq domains this works correctly as none of the drivers can have a dependency on affinity setting in inactive state by design.
Remove the X86 workaround as it is not longer required.
Fixes: 02edee152d6e ("x86/apic/vector: Ignore set_affinity call for inactive interrupts") Reported-by: Ali Saidi <alisaidi@amazon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Ali Saidi <alisaidi@amazon.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200529015501.15771-1-alisaidi@amazon.com Link: https://lkml.kernel.org/r/877dv2rv25.fsf@nanos.tec.linutronix.de
show more ...
|
#
e3beca48 |
| 09-Jul-2020 |
Thomas Gleixner <tglx@linutronix.de> |
irqdomain/treewide: Keep firmware node unconditionally allocated
Quite some non OF/ACPI users of irqdomains allocate firmware nodes of type IRQCHIP_FWNODE_NAMED or IRQCHIP_FWNODE_NAMED_ID and free t
irqdomain/treewide: Keep firmware node unconditionally allocated
Quite some non OF/ACPI users of irqdomains allocate firmware nodes of type IRQCHIP_FWNODE_NAMED or IRQCHIP_FWNODE_NAMED_ID and free them right after creating the irqdomain. The only purpose of these FW nodes is to convey name information. When this was introduced the core code did not store the pointer to the node in the irqdomain. A recent change stored the firmware node pointer in irqdomain for other reasons and missed to notice that the usage sites which do the alloc_fwnode/create_domain/free_fwnode sequence are broken by this. Storing a dangling pointer is dangerous itself, but in case that the domain is destroyed later on this leads to a double free.
Remove the freeing of the firmware node after creating the irqdomain from all affected call sites to cure this.
Fixes: 711419e504eb ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode") Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/873661qakd.fsf@nanos.tec.linutronix.de
show more ...
|