#
e0b7de4f |
| 26-Jul-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging
Disallow copying MTE tags to guest memory while KVM is dirty logging, as writing guest memory without marking the gfn as d
KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging
Disallow copying MTE tags to guest memory while KVM is dirty logging, as writing guest memory without marking the gfn as dirty in the memslot could result in userspace failing to migrate the updated page. Ideally (maybe?), KVM would simply mark the gfn as dirty, but there is no vCPU to work with, and presumably the only use case for copy MTE tags _to_ the guest is when restoring state on the target.
Fixes: f0376edb1ddc ("KVM: arm64: Add ioctl to fetch/store tags in a guest") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20240726235234.228822-3-seanjc@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
show more ...
|
#
ae41d7db |
| 26-Jul-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE
Put the page reference acquired by gfn_to_pfn_prot() if kvm_vm_ioctl_mte_copy_tags() runs into ZONE_DEVICE memory. KVM's
KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE
Put the page reference acquired by gfn_to_pfn_prot() if kvm_vm_ioctl_mte_copy_tags() runs into ZONE_DEVICE memory. KVM's less- than-stellar heuristics for dealing with pfn-mapped memory means that KVM can get a page reference to ZONE_DEVICE memory.
Fixes: f0376edb1ddc ("KVM: arm64: Add ioctl to fetch/store tags in a guest") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20240726235234.228822-2-seanjc@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
show more ...
|
#
dfe6d190 |
| 24-May-2024 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Allow AArch32 PSTATE.M to be restored as System mode
It appears that we don't allow a vcpu to be restored in AArch32 System mode, as we *never* included it in the list of valid modes.
J
KVM: arm64: Allow AArch32 PSTATE.M to be restored as System mode
It appears that we don't allow a vcpu to be restored in AArch32 System mode, as we *never* included it in the list of valid modes.
Just add it to the list of allowed modes.
Fixes: 0d854a60b1d7 ("arm64: KVM: enable initialization of a 32bit vcpu") Cc: stable@vger.kernel.org Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240524141956.1450304-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
show more ...
|
#
947051e3 |
| 24-May-2024 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Fix AArch32 register narrowing on userspace write
When userspace writes to one of the core registers, we make sure to narrow the corresponding GPRs if PSTATE indicates an AArch32 context
KVM: arm64: Fix AArch32 register narrowing on userspace write
When userspace writes to one of the core registers, we make sure to narrow the corresponding GPRs if PSTATE indicates an AArch32 context.
The code tries to check whether the context is EL0 or EL1 so that it narrows the correct registers. But it does so by checking the full PSTATE instead of PSTATE.M.
As a consequence, and if we are restoring an AArch32 EL0 context in a 64bit guest, and that PSTATE has *any* bit set outside of PSTATE.M, we narrow *all* registers instead of only the first 15, destroying the 64bit state.
Obviously, this is not something the guest is likely to enjoy.
Correctly masking PSTATE to only evaluate PSTATE.M fixes it.
Fixes: 90c1f934ed71 ("KVM: arm64: Get rid of the AArch32 register mapping code") Reported-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Cc: stable@vger.kernel.org Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240524141956.1450304-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
show more ...
|
#
5a00bfd6 |
| 15-Feb-2024 |
Ryan Roberts <ryan.roberts@arm.com> |
arm64/mm: new ptep layer to manage contig bit
Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API
arm64/mm: new ptep layer to manage contig bit
Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API and the public API is just a simple wrapper that calls the private API.
The public API implementation will subsequently be used to transparently manipulate the contiguous bit where appropriate. But since there are already some contig-aware users (e.g. hugetlb, kernel mapper), we must first ensure those users use the private API directly so that the future contig-bit manipulations in the public API do not interfere with those existing uses.
The following APIs are treated this way:
- ptep_get - set_pte - set_ptes - pte_clear - ptep_get_and_clear - ptep_test_and_clear_young - ptep_clear_flush_young - ptep_set_wrprotect - ptep_set_access_flags
Link: https://lkml.kernel.org/r/20240215103205.2607016-11-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Tested-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Barry Song <21cnbao@gmail.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
659e1930 |
| 15-Feb-2024 |
Ryan Roberts <ryan.roberts@arm.com> |
arm64/mm: convert set_pte_at() to set_ptes(..., 1)
Since set_ptes() was introduced, set_pte_at() has been implemented as a generic macro around set_ptes(..., 1). So this change should continue to g
arm64/mm: convert set_pte_at() to set_ptes(..., 1)
Since set_ptes() was introduced, set_pte_at() has been implemented as a generic macro around set_ptes(..., 1). So this change should continue to generate the same code. However, making this change prepares us for the transparent contpte support. It means we can reroute set_ptes() to __set_ptes(). Since set_pte_at() is a generic macro, there will be no equivalent __set_pte_at() to reroute to.
Note that a couple of calls to set_pte_at() remain in the arch code. This is intentional, since those call sites are acting on behalf of core-mm and should continue to call into the public set_ptes() rather than the arch-private __set_ptes().
Link: https://lkml.kernel.org/r/20240215103205.2607016-9-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Tested-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Barry Song <21cnbao@gmail.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
39db66e6 |
| 17-Jan-2024 |
Randy Dunlap <rdunlap@infradead.org> |
KVM: arm64: guest: fix kernel-doc warnings
Fix multiple function parameter descriptions to prevent warnings:
guest.c:718: warning: Function parameter or struct member 'vcpu' not described in 'kvm_a
KVM: arm64: guest: fix kernel-doc warnings
Fix multiple function parameter descriptions to prevent warnings:
guest.c:718: warning: Function parameter or struct member 'vcpu' not described in 'kvm_arm_num_regs' guest.c:736: warning: Function parameter or struct member 'vcpu' not described in 'kvm_arm_copy_reg_indices' guest.c:736: warning: Function parameter or struct member 'uindices' not described in 'kvm_arm_copy_reg_indices' arch/arm64/kvm/guest.c:915: warning: Excess function parameter 'kvm' description in 'kvm_arch_vcpu_ioctl_set_guest_debug' arch/arm64/kvm/guest.c:915: warning: Excess function parameter 'kvm_guest_debug' description in 'kvm_arch_vcpu_ioctl_set_guest_debug'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.linux.dev Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20240117230714.31025-3-rdunlap@infradead.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
show more ...
|
#
851354cb |
| 16-Oct-2023 |
Andre Przywara <andre.przywara@arm.com> |
clocksource/drivers/arm_arch_timer: limit XGene-1 workaround
The AppliedMicro XGene-1 CPU has an erratum where the timer condition would only consider TVAL, not CVAL. We currently apply a workaround
clocksource/drivers/arm_arch_timer: limit XGene-1 workaround
The AppliedMicro XGene-1 CPU has an erratum where the timer condition would only consider TVAL, not CVAL. We currently apply a workaround when seeing the PartNum field of MIDR_EL1 being 0x000, under the assumption that this would match only the XGene-1 CPU model. However even the Ampere eMAG (aka XGene-3) uses that same part number, and only differs in the "Variant" and "Revision" fields: XGene-1's MIDR is 0x500f0000, our eMAG reports 0x503f0002. Experiments show the latter doesn't show the faulty behaviour.
Increase the specificity of the check to only consider partnum 0x000 and variant 0x00, to exclude the Ampere eMAG.
Fixes: 012f18850452 ("clocksource/drivers/arm_arch_timer: Work around broken CVAL implementations") Reported-by: Ross Burton <ross.burton@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20231016153127.116101-1-andre.przywara@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
show more ...
|
#
d8569fba |
| 16-Oct-2023 |
Mark Rutland <mark.rutland@arm.com> |
arm64: kvm: Use cpus_have_final_cap() explicitly
Much of the arm64 KVM code uses cpus_have_const_cap() to check for cpucaps, but this is unnecessary and it would be preferable to use cpus_have_final
arm64: kvm: Use cpus_have_final_cap() explicitly
Much of the arm64 KVM code uses cpus_have_const_cap() to check for cpucaps, but this is unnecessary and it would be preferable to use cpus_have_final_cap().
For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching.
KVM is initialized after cpucaps have been finalized and alternatives have been patched. Since commit:
d86de40decaa14e6 ("arm64: cpufeature: upgrade hyp caps to final")
... use of cpus_have_const_cap() in hyp code is automatically converted to use cpus_have_final_cap():
| static __always_inline bool cpus_have_const_cap(int num) | { | if (is_hyp_code()) | return cpus_have_final_cap(num); | else if (system_capabilities_finalized()) | return __cpus_have_const_cap(num); | else | return cpus_have_cap(num); | }
Thus, converting hyp code to use cpus_have_final_cap() directly will not result in any functional change.
Non-hyp KVM code is also not executed until cpucaps have been finalized, and it would be preferable to extent the same treatment to this code and use cpus_have_final_cap() directly.
This patch converts instances of cpus_have_const_cap() in KVM-only code over to cpus_have_final_cap(). As all of this code runs after cpucaps have been finalized, there should be no functional change as a result of this patch, but the redundant instructions generated by cpus_have_const_cap() will be removed from the non-hyp KVM code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
show more ...
|
#
5346f7e1 |
| 10-Jul-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Always return generic v8 as the preferred target
Userspace selecting an implementation-specific vCPU target has been completely useless for a very long time. Let's go whole hog and start
KVM: arm64: Always return generic v8 as the preferred target
Userspace selecting an implementation-specific vCPU target has been completely useless for a very long time. Let's go whole hog and start returning the generic v8 target across all implementations as the preferred target.
Uphold the pre-existing behavior by tolerating either the generic target or an implementation-specific target if the vCPU happens to be running on one of the lucky few parts.
Acked-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230710193140.1706399-5-oliver.upton@linux.dev
show more ...
|
#
680232a9 |
| 30-Mar-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: timers: Allow save/restoring of the physical timer
Nothing like being 10 year late to a party!
Now that userspace can set counter offsets, we can save/restore the physical timer as well
KVM: arm64: timers: Allow save/restoring of the physical timer
Nothing like being 10 year late to a party!
Now that userspace can set counter offsets, we can save/restore the physical timer as well!
Nobody really cared so far, but you're welcome anyway.
Reviewed-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-9-maz@kernel.org
show more ...
|
#
4bba7f7d |
| 27-Mar-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Use config_lock to protect data ordered against KVM_RUN
There are various bits of VM-scoped data that can only be configured before the first call to KVM_RUN, such as the hypercall bitma
KVM: arm64: Use config_lock to protect data ordered against KVM_RUN
There are various bits of VM-scoped data that can only be configured before the first call to KVM_RUN, such as the hypercall bitmaps and the PMU. As these fields are protected by the kvm->lock and accessed while holding vcpu->mutex, this is yet another example of lock inversion.
Change out the kvm->lock for kvm->arch.config_lock in all of these instances. Opportunistically simplify the locking mechanics of the PMU configuration by holding the config_lock for the entirety of kvm_arm_pmu_v3_set_attr().
Note that this also addresses a couple of bugs. There is an unguarded read of the PMU version in KVM_ARM_VCPU_PMU_V3_FILTER which could race with KVM_ARM_VCPU_PMU_V3_SET_PMU. Additionally, until now writes to the per-vCPU vPMU irq were not serialized VM-wide, meaning concurrent calls to KVM_ARM_VCPU_PMU_V3_IRQ could lead to a false positive in pmu_irq_is_valid().
Cc: stable@vger.kernel.org Tested-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230327164747.2466958-4-oliver.upton@linux.dev
show more ...
|
#
2def950c |
| 08-Feb-2023 |
Thomas Huth <thuth@redhat.com> |
KVM: arm64: Limit length in kvm_vm_ioctl_mte_copy_tags() to INT_MAX
In case of success, this function returns the amount of handled bytes. However, this does not work for large values: The function
KVM: arm64: Limit length in kvm_vm_ioctl_mte_copy_tags() to INT_MAX
In case of success, this function returns the amount of handled bytes. However, this does not work for large values: The function is called from kvm_arch_vm_ioctl() (which still returns a long), which in turn is called from kvm_vm_ioctl() in virt/kvm/kvm_main.c. And that function stores the return value in an "int r" variable. So the upper 32-bits of the "long" return value are lost there.
KVM ioctl functions should only return "int" values, so let's limit the amount of bytes that can be requested here to INT_MAX to avoid the problem with the truncated return value. We can then also change the return type of the function to "int" to make it clearer that it is not possible to return a "long" here.
Fixes: f0376edb1ddc ("KVM: arm64: Add ioctl to fetch/store tags in a guest") Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Steven Price <steven.price@arm.com> Message-Id: <20230208140105.655814-5-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
1d05d51b |
| 09-Feb-2023 |
Christoffer Dall <christoffer.dall@linaro.org> |
KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
We were not allowing userspace to set a more privileged mode for the VCPU than EL1, but we should allow this when nested virtualization is enable
KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
We were not allowing userspace to set a more privileged mode for the VCPU than EL1, but we should allow this when nested virtualization is enabled for the VCPU.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-6-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
show more ...
|
#
c3b37c2d |
| 19-Jan-2023 |
Catalin Marinas <catalin.marinas@arm.com> |
KVM: arm64: Pass the actual page address to mte_clear_page_tags()
Commit d77e59a8fccd ("arm64: mte: Lock a page for MTE tag initialisation") added a call to mte_clear_page_tags() in case a prior mte
KVM: arm64: Pass the actual page address to mte_clear_page_tags()
Commit d77e59a8fccd ("arm64: mte: Lock a page for MTE tag initialisation") added a call to mte_clear_page_tags() in case a prior mte_copy_tags_from_user() failed in order to avoid stale tags in the guest page (it should have really been a separate commit). Unfortunately, the argument passed to this function was the address of the struct page rather than the actual page address. Fix this function call.
Fixes: d77e59a8fccd ("arm64: mte: Lock a page for MTE tag initialisation") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230119170902.1574756-1-catalin.marinas@arm.com
show more ...
|
#
d77e59a8 |
| 04-Nov-2022 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: mte: Lock a page for MTE tag initialisation
Initialising the tags and setting PG_mte_tagged flag for a page can race between multiple set_pte_at() on shared pages or setting the stage 2 pte v
arm64: mte: Lock a page for MTE tag initialisation
Initialising the tags and setting PG_mte_tagged flag for a page can race between multiple set_pte_at() on shared pages or setting the stage 2 pte via user_mem_abort(). Introduce a new PG_mte_lock flag as PG_arch_3 and set it before attempting page initialisation. Given that PG_mte_tagged is never cleared for a page, consider setting this flag to mean page unlocked and wait on this bit with acquire semantics if the page is locked:
- try_page_mte_tagging() - lock the page for tagging, return true if it can be tagged, false if already tagged. No acquire semantics if it returns true (PG_mte_tagged not set) as there is no serialisation with a previous set_page_mte_tagged().
- set_page_mte_tagged() - set PG_mte_tagged with release semantics.
The two-bit locking is based on Peter Collingbourne's idea.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Steven Price <steven.price@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Peter Collingbourne <pcc@google.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221104011041.290951-6-pcc@google.com
show more ...
|
#
e059853d |
| 04-Nov-2022 |
Catalin Marinas <catalin.marinas@arm.com> |
arm64: mte: Fix/clarify the PG_mte_tagged semantics
Currently the PG_mte_tagged page flag mostly means the page contains valid tags and it should be set after the tags have been cleared or restored.
arm64: mte: Fix/clarify the PG_mte_tagged semantics
Currently the PG_mte_tagged page flag mostly means the page contains valid tags and it should be set after the tags have been cleared or restored. However, in mte_sync_tags() it is set before setting the tags to avoid, in theory, a race with concurrent mprotect(PROT_MTE) for shared pages. However, a concurrent mprotect(PROT_MTE) with a copy on write in another thread can cause the new page to have stale tags. Similarly, tag reading via ptrace() can read stale tags if the PG_mte_tagged flag is set before actually clearing/restoring the tags.
Fix the PG_mte_tagged semantics so that it is only set after the tags have been cleared or restored. This is safe for swap restoring into a MAP_SHARED or CoW page since the core code takes the page lock. Add two functions to test and set the PG_mte_tagged flag with acquire and release semantics. The downside is that concurrent mprotect(PROT_MTE) on a MAP_SHARED page may cause tag loss. This is already the case for KVM guests if a VMM changes the page protection while the guest triggers a user_mem_abort().
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> [pcc@google.com: fix build with CONFIG_ARM64_MTE disabled] Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Steven Price <steven.price@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221104011041.290951-3-pcc@google.com
show more ...
|
#
370531d1 |
| 17-Sep-2022 |
Reiji Watanabe <reijiw@google.com> |
KVM: arm64: Clear PSTATE.SS when the Software Step state was Active-pending
While userspace enables single-step, if the Software Step state at the last guest exit was "Active-pending", clear PSTATE.
KVM: arm64: Clear PSTATE.SS when the Software Step state was Active-pending
While userspace enables single-step, if the Software Step state at the last guest exit was "Active-pending", clear PSTATE.SS on guest entry to restore the state.
Currently, KVM sets PSTATE.SS to 1 on every guest entry while userspace enables single-step for the vCPU (with KVM_GUESTDBG_SINGLESTEP). It means KVM always makes the vCPU's Software Step state "Active-not-pending" on the guest entry, which lets the VCPU perform single-step (then Software Step exception is taken). This could cause extra single-step (without returning to userspace) if the Software Step state at the last guest exit was "Active-pending" (i.e. the last exit was triggered by an asynchronous exception after the single-step is performed, but before the Software Step exception is taken. See "Figure D2-3 Software step state machine" and "D2.12.7 Behavior in the active-pending state" in ARM DDI 0487I.a for more info about this behavior).
Fix this by clearing PSTATE.SS on guest entry if the Software Step state at the last exit was "Active-pending" so that KVM restore the state (and the exception is taken before further single-step is performed).
Fixes: 337b99bf7edf ("KVM: arm64: guest debug, add support for single-step") Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220917010600.532642-3-reijiw@google.com
show more ...
|
#
b10d86fb |
| 16-Aug-2022 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Reject 32bit user PSTATE on asymmetric systems
KVM does not support AArch32 EL0 on asymmetric systems. To that end, prevent userspace from configuring a vCPU in such a state through sett
KVM: arm64: Reject 32bit user PSTATE on asymmetric systems
KVM does not support AArch32 EL0 on asymmetric systems. To that end, prevent userspace from configuring a vCPU in such a state through setting PSTATE.
It is already ABI that KVM rejects such a write on a system where AArch32 EL0 is unsupported. Though the kernel's definition of a 32bit system changed in commit 2122a833316f ("arm64: Allow mismatched 32-bit EL0 support"), KVM's did not.
Fixes: 2122a833316f ("arm64: Allow mismatched 32-bit EL0 support") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220816192554.1455559-3-oliver.upton@linux.dev
show more ...
|
#
05714cab |
| 02-May-2022 |
Raghavendra Rao Ananta <rananta@google.com> |
KVM: arm64: Setup a framework for hypercall bitmap firmware registers
KVM regularly introduces new hypercall services to the guests without any consent from the userspace. This means, the guests can
KVM: arm64: Setup a framework for hypercall bitmap firmware registers
KVM regularly introduces new hypercall services to the guests without any consent from the userspace. This means, the guests can observe hypercall services in and out as they migrate across various host kernel versions. This could be a major problem if the guest discovered a hypercall, started using it, and after getting migrated to an older kernel realizes that it's no longer available. Depending on how the guest handles the change, there's a potential chance that the guest would just panic.
As a result, there's a need for the userspace to elect the services that it wishes the guest to discover. It can elect these services based on the kernels spread across its (migration) fleet. To remedy this, extend the existing firmware pseudo-registers, such as KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space for all the hypercall services available.
These firmware registers are categorized based on the service call owners, but unlike the existing firmware pseudo-registers, they hold the features supported in the form of a bitmap.
During the VM initialization, the registers are set to upper-limit of the features supported by the corresponding registers. It's expected that the VMMs discover the features provided by each register via GET_ONE_REG, and write back the desired values using SET_ONE_REG. KVM allows this modification only until the VM has started.
Some of the standard features are not mapped to any bits of the registers. But since they can recreate the original problem of making it available without userspace's consent, they need to be explicitly added to the case-list in kvm_hvc_call_default_allowed(). Any function-id that's not enabled via the bitmap, or not listed in kvm_hvc_call_default_allowed, will be returned as SMCCC_RET_NOT_SUPPORTED to the guest.
Older userspace code can simply ignore the feature and the hypercall services will be exposed unconditionally to the guests, thus ensuring backward compatibility.
In this patch, the framework adds the register only for ARM's standard secure services (owner value 4). Currently, this includes support only for ARM True Random Number Generator (TRNG) service, with bit-0 of the register representing mandatory features of v1.0. Other services are momentarily added in the upcoming patches.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> [maz: reduced the scope of some helpers, tidy-up bitmap max values, dropped error-only fast path] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220502233853.1233742-3-rananta@google.com
show more ...
|
#
85fbe08e |
| 02-May-2022 |
Raghavendra Rao Ananta <rananta@google.com> |
KVM: arm64: Factor out firmware register handling from psci.c
Common hypercall firmware register handing is currently employed by psci.c. Since the upcoming patches add more of these registers, it's
KVM: arm64: Factor out firmware register handling from psci.c
Common hypercall firmware register handing is currently employed by psci.c. Since the upcoming patches add more of these registers, it's better to move the generic handling to hypercall.c for a cleaner presentation.
While we are at it, collect all the firmware registers under fw_reg_ids[] to help implement kvm_arm_get_fw_num_regs() and kvm_arm_copy_fw_reg_indices() in a generic way. Also, define KVM_REG_FEATURE_LEVEL_MASK using a GENMASK instead.
No functional change intended.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> [maz: fixed KVM_REG_FEATURE_LEVEL_MASK] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220502233853.1233742-2-rananta@google.com
show more ...
|
#
21ea4578 |
| 18-Mar-2022 |
Julia Lawall <Julia.Lawall@inria.fr> |
KVM: arm64: fix typos in comments
Various spelling mistakes in comments. Detected with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Marc Zyngier <maz@k
KVM: arm64: fix typos in comments
Various spelling mistakes in comments. Detected with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220318103729.157574-24-Julia.Lawall@inria.fr
show more ...
|
#
08e873cb |
| 05-Nov-2021 |
YueHaibing <yuehaibing@huawei.com> |
KVM: arm64: Change the return type of kvm_vcpu_preferred_target()
kvm_vcpu_preferred_target() always return 0 because kvm_target_cpu() never returns a negative error code.
Signed-off-by: YueHaibing
KVM: arm64: Change the return type of kvm_vcpu_preferred_target()
kvm_vcpu_preferred_target() always return 0 because kvm_target_cpu() never returns a negative error code.
Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211105011500.16280-1-yuehaibing@huawei.com
show more ...
|
#
f95937cc |
| 02-Aug-2021 |
Jing Zhang <jingzhangos@google.com> |
KVM: stats: Support linear and logarithmic histogram statistics
Add new types of KVM stats, linear and logarithmic histogram. Histogram are very useful for observing the value distribution of time o
KVM: stats: Support linear and logarithmic histogram statistics
Add new types of KVM stats, linear and logarithmic histogram. Histogram are very useful for observing the value distribution of time or size related stats.
Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210802165633.1866976-2-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
fe5161d2 |
| 02-Aug-2021 |
Oliver Upton <oupton@google.com> |
KVM: arm64: Record number of signal exits as a vCPU stat
Most other architectures that implement KVM record a statistic indicating the number of times a vCPU has exited due to a pending signal. Add
KVM: arm64: Record number of signal exits as a vCPU stat
Most other architectures that implement KVM record a statistic indicating the number of times a vCPU has exited due to a pending signal. Add support for that stat to arm64.
Reviewed-by: Jing Zhang <jingzhangos@google.com> Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210802192809.1851010-2-oupton@google.com
show more ...
|