#
1ed0f119 |
| 06-Sep-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: nVMX: Explicitly invalidate posted_intr_nv if PI is disabled at VM-Enter
Explicitly invalidate posted_intr_nv when emulating nested VM-Enter and posted interrupts are disabled to make it clear
KVM: nVMX: Explicitly invalidate posted_intr_nv if PI is disabled at VM-Enter
Explicitly invalidate posted_intr_nv when emulating nested VM-Enter and posted interrupts are disabled to make it clear that posted_intr_nv is valid if and only if nested posted interrupts are enabled, and as a cheap way to harden against KVM bugs.
KVM initializes posted_intr_nv to -1 at vCPU creation and resets it to -1 when unloading vmcs12 and/or leaving nested mode, i.e. this is not a bug fix (or at least, it's not intended to be a bug fix).
Note, tracking nested.posted_intr_nv as a u16 subtly adds a measure of safety, as it prevents unintentionally matching KVM's informal "no IRQ" vector of -1, stored as a signed int. Because a u16 can be always be represented as a signed int, the effective "invalid" value of posted_intr_nv, 65535, will be preserved as-is when comparing against an int, i.e. will be zero-extended, not sign-extended, and thus won't get a false positive if KVM is buggy and compares posted_intr_nv against -1.
Opportunistically add a comment in vmx_deliver_nested_posted_interrupt() to call out that it must check vmx->nested.posted_intr_nv, not the vector in vmcs12, which is presumably the _entire_ reason nested.posted_intr_nv exists. E.g. vmcs12 is a KVM-controlled snapshot, so there are no TOCTOU races to worry about, the only potential badness is if the vCPU leaves nested and frees vmcs12 between the sender checking is_guest_mode() and dereferencing the vmcs12 pointer.
Link: https://lore.kernel.org/r/20240906043413.1049633-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
6e0b4565 |
| 06-Sep-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: nVMX: Detect nested posted interrupt NV at nested VM-Exit injection
When synthensizing a nested VM-Exit due to an external interrupt, pend a nested posted interrupt if the external interrupt ve
KVM: nVMX: Detect nested posted interrupt NV at nested VM-Exit injection
When synthensizing a nested VM-Exit due to an external interrupt, pend a nested posted interrupt if the external interrupt vector matches L2's PI notification vector, i.e. if the interrupt is a PI notification for L2. This fixes a bug where KVM will incorrectly inject VM-Exit instead of processing nested posted interrupt when IPI virtualization is enabled.
Per the SDM, detection of the notification vector doesn't occur until the interrupt is acknowledge and deliver to the CPU core.
If the external-interrupt exiting VM-execution control is 1, any unmasked external interrupt causes a VM exit (see Section 26.2). If the "process posted interrupts" VM-execution control is also 1, this behavior is changed and the processor handles an external interrupt as follows:
1. The local APIC is acknowledged; this provides the processor core with an interrupt vector, called here the physical vector. 2. If the physical vector equals the posted-interrupt notification vector, the logical processor continues to the next step. Otherwise, a VM exit occurs as it would normally due to an external interrupt; the vector is saved in the VM-exit interruption-information field.
For the most part, KVM has avoided problems because a PI NV for L2 that arrives will L2 is active will be processed by hardware, and KVM checks for a pending notification vector during nested VM-Enter. Thus, to hit the bug, the PI NV interrupt needs to sneak its way into L1's vIRR while L2 is active.
Without IPI virtualization, the scenario is practically impossible to hit, modulo L1 doing weird things (see below), as the ordering between vmx_deliver_posted_interrupt() and nested VM-Enter effectively guarantees that either the sender will see the vCPU as being in_guest_mode(), or the receiver will see the interrupt in its vIRR.
With IPI virtualization, introduced by commit d588bb9be1da ("KVM: VMX: enable IPI virtualization"), the sending CPU effectively implements a rough equivalent of vmx_deliver_posted_interrupt(), sans the nested PI NV check. If the target vCPU has a valid PID, the CPU will send a PI NV interrupt based on _L1's_ PID, as the sender's because IPIv table points at L1 PIDs.
PIR := 32 bytes at PID_ADDR; // under lock PIR[V] := 1; store PIR at PID_ADDR; // release lock
NotifyInfo := 8 bytes at PID_ADDR + 32; // under lock IF NotifyInfo.ON = 0 AND NotifyInfo.SN = 0; THEN NotifyInfo.ON := 1; SendNotify := 1; ELSE SendNotify := 0; FI; store NotifyInfo at PID_ADDR + 32; // release lock
IF SendNotify = 1; THEN send an IPI specified by NotifyInfo.NDST and NotifyInfo.NV; FI;
As a result, the target vCPU ends up receiving an interrupt on KVM's POSTED_INTR_VECTOR while L2 is running, with an interrupt in L1's PIR for L2's nested PI NV. The POSTED_INTR_VECTOR interrupt triggers a VM-Exit from L2 to L0, KVM moves the interrupt from L1's PIR to vIRR, triggers a KVM_REQ_EVENT prior to re-entry to L2, and calls vmx_check_nested_events(), effectively bypassing all of KVM's "early" checks on nested PI NV.
Without IPI virtualization, the bug can likely be hit only if L1 programs an assigned device to _post_ an interrupt to L2's notification vector, by way of L1's PID.PIR. Doing so would allow the interrupt to get into L1's vIRR without KVM checking vmcs12's NV. Which is architecturally allowed, but unlikely behavior for a hypervisor.
Cc: Zeng Guang <guang.zeng@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20240906043413.1049633-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
8c23670f |
| 06-Sep-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: nVMX: Suppress external interrupt VM-Exit injection if there's no IRQ
In the should-be-impossible scenario that kvm_cpu_get_interrupt() doesn't return a valid vector after checking kvm_cpu_has_
KVM: nVMX: Suppress external interrupt VM-Exit injection if there's no IRQ
In the should-be-impossible scenario that kvm_cpu_get_interrupt() doesn't return a valid vector after checking kvm_cpu_has_interrupt(), skip VM-Exit injection to reduce the probability of crashing/confusing L1. Now that KVM gets the IRQ _before_ calling nested_vmx_vmexit(), squashing the VM-Exit injection is trivial since there are no actions that need to be undone.
Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20240906043413.1049633-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
363010e1 |
| 06-Sep-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: nVMX: Get to-be-acknowledge IRQ for nested VM-Exit at injection site
Move the logic to get the to-be-acknowledge IRQ for a nested VM-Exit from nested_vmx_vmexit() to vmx_check_nested_events(),
KVM: nVMX: Get to-be-acknowledge IRQ for nested VM-Exit at injection site
Move the logic to get the to-be-acknowledge IRQ for a nested VM-Exit from nested_vmx_vmexit() to vmx_check_nested_events(), which is subtly the one and only path where KVM invokes nested_vmx_vmexit() with EXIT_REASON_EXTERNAL_INTERRUPT. A future fix will perform a last-minute check on L2's nested posted interrupt notification vector, just before injecting a nested VM-Exit. To handle that scenario correctly, KVM needs to get the interrupt _before_ injecting VM-Exit, as simply querying the highest priority interrupt, via kvm_cpu_has_interrupt(), would result in TOCTOU bug, as a new, higher priority interrupt could arrive between kvm_cpu_has_interrupt() and kvm_cpu_get_interrupt().
Unfortunately, simply moving the call to kvm_cpu_get_interrupt() doesn't suffice, as a VMWRITE to GUEST_INTERRUPT_STATUS.SVI is hiding in kvm_get_apic_interrupt(), and acknowledging the interrupt before nested VM-Exit would cause the VMWRITE to hit vmcs02 instead of vmcs01.
Open code a rough equivalent to kvm_cpu_get_interrupt() so that the IRQ is acknowledged after emulating VM-Exit, taking care to avoid the TOCTOU issue described above.
Opportunistically convert the WARN_ON() to a WARN_ON_ONCE(). If KVM has a bug that results in a false positive from kvm_cpu_has_interrupt(), spamming dmesg won't help the situation.
Note, nested_vmx_reflect_vmexit() can never reflect external interrupts as they are always "wanted" by L0.
Link: https://lore.kernel.org/r/20240906043413.1049633-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
41ab0d59 |
| 25-Jul-2024 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nVMX: Use vmx_segment_cache_clear() instead of open coded equivalent
In prepare_vmcs02_rare(), call vmx_segment_cache_clear() instead of setting segment_cache.bitmask directly. Using the helpe
KVM: nVMX: Use vmx_segment_cache_clear() instead of open coded equivalent
In prepare_vmcs02_rare(), call vmx_segment_cache_clear() instead of setting segment_cache.bitmask directly. Using the helper minimizes the chances of prepare_vmcs02_rare() doing the wrong thing in the future, e.g. if KVM ends up doing more than just zero the bitmask when purging the cache.
No functional change intended.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240725175232.337266-2-mlevitsk@redhat.com [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
653ea448 |
| 22-Jul-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: nVMX: Honor userspace MSR filter lists for nested VM-Enter/VM-Exit
Synthesize a consistency check VM-Exit (VM-Enter) or VM-Abort (VM-Exit) if L1 attempts to load/store an MSR via the VMCS MSR l
KVM: nVMX: Honor userspace MSR filter lists for nested VM-Enter/VM-Exit
Synthesize a consistency check VM-Exit (VM-Enter) or VM-Abort (VM-Exit) if L1 attempts to load/store an MSR via the VMCS MSR lists that userspace has disallowed access to via an MSR filter. Intel already disallows including a handful of "special" MSRs in the VMCS lists, so denying access isn't completely without precedent.
More importantly, the behavior is well-defined _and_ can be communicated the end user, e.g. to the customer that owns a VM running as L1 on top of KVM. On the other hand, ignoring userspace MSR filters is all but guaranteed to result in unexpected behavior as the access will hit KVM's internal state, which is likely not up-to-date.
Unlike KVM-internal accesses, instruction emulation, and dedicated VMCS fields, the MSRs in the VMCS load/store lists are 100% guest controlled, thus making it all but impossible to reason about the correctness of ignoring the MSR filter. And if userspace *really* wants to deny access to MSRs via the aforementioned scenarios, userspace can hide the associated feature from the guest, e.g. by disabling the PMU to prevent accessing PERF_GLOBAL_CTRL via its VMCS field. But for the MSR lists, KVM is blindly processing MSRs; the MSR filters are the _only_ way for userspace to deny access.
This partially reverts commit ac8d6cad3c7b ("KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr").
Cc: Hou Wenlong <houwenlong.hwl@antgroup.com> Cc: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20240722235922.3351122-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
566975f6 |
| 05-Jun-2024 |
Xin Li <xin3.li@intel.com> |
KVM: nVMX: Use macros and #defines in vmx_restore_vmx_misc()
Use macros in vmx_restore_vmx_misc() instead of open coding everything using BIT_ULL() and GENMASK_ULL(). Opportunistically split featur
KVM: nVMX: Use macros and #defines in vmx_restore_vmx_misc()
Use macros in vmx_restore_vmx_misc() instead of open coding everything using BIT_ULL() and GENMASK_ULL(). Opportunistically split feature bits and reserved bits into separate variables, and add a comment explaining the subset logic (it's not immediately obvious that the set of feature bits is NOT the set of _supported_ feature bits).
Cc: Shan Kang <shan.kang@intel.com> Cc: Kai Huang <kai.huang@intel.com> Signed-off-by: Xin Li <xin3.li@intel.com> [sean: split to separate patch, write changelog, drop #defines] Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20240605231918.2915961-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
dc1e67f7 |
| 05-Jun-2024 |
Sean Christopherson <seanjc@google.com> |
KVM VMX: Move MSR_IA32_VMX_MISC bit defines to asm/vmx.h
Move the handful of MSR_IA32_VMX_MISC bit defines that are currently in msr-indx.h to vmx.h so that all of the VMX_MISC defines and wrappers
KVM VMX: Move MSR_IA32_VMX_MISC bit defines to asm/vmx.h
Move the handful of MSR_IA32_VMX_MISC bit defines that are currently in msr-indx.h to vmx.h so that all of the VMX_MISC defines and wrappers can be found in a single location.
Opportunistically use BIT_ULL() instead of open coding hex values, add defines for feature bits that are architecturally defined, and move the defines down in the file so that they are colocated with the helpers for getting fields from VMX_MISC.
No functional change intended.
Cc: Shan Kang <shan.kang@intel.com> Cc: Kai Huang <kai.huang@intel.com> Signed-off-by: Xin Li <xin3.li@intel.com> [sean: split to separate patch, write changelog] Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20240605231918.2915961-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
92e64804 |
| 05-Jun-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: nVMX: Add a helper to encode VMCS info in MSR_IA32_VMX_BASIC
Add a helper to encode the VMCS revision, size, and supported memory types in MSR_IA32_VMX_BASIC, i.e. when synthesizing KVM's suppo
KVM: nVMX: Add a helper to encode VMCS info in MSR_IA32_VMX_BASIC
Add a helper to encode the VMCS revision, size, and supported memory types in MSR_IA32_VMX_BASIC, i.e. when synthesizing KVM's supported BASIC MSR value, and delete the now unused VMCS size and memtype shift macros.
For a variety of reasons, KVM has shifted (pun intended) to using helpers to *get* information from the VMX MSRs, as opposed to defined MASK and SHIFT macros for direct use. Provide a similar helper for the nested VMX code, which needs to *set* information, so that KVM isn't left with a mix of SHIFT macros and dedicated helpers.
Reported-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20240605231918.2915961-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
c97b106f |
| 05-Jun-2024 |
Xin Li <xin3.li@intel.com> |
KVM: nVMX: Use macros and #defines in vmx_restore_vmx_basic()
Use macros in vmx_restore_vmx_basic() instead of open coding everything using BIT_ULL() and GENMASK_ULL(). Opportunistically split feat
KVM: nVMX: Use macros and #defines in vmx_restore_vmx_basic()
Use macros in vmx_restore_vmx_basic() instead of open coding everything using BIT_ULL() and GENMASK_ULL(). Opportunistically split feature bits and reserved bits into separate variables, and add a comment explaining the subset logic (it's not immediately obvious that the set of feature bits is NOT the set of _supported_ feature bits).
Cc: Shan Kang <shan.kang@intel.com> Cc: Kai Huang <kai.huang@intel.com> Signed-off-by: Xin Li <xin3.li@intel.com> [sean: split to separate patch, write changelog, drop #defines] Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20240605231918.2915961-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
e7e80b66 |
| 05-Jun-2024 |
Sean Christopherson <seanjc@google.com> |
x86/cpu: KVM: Add common defines for architectural memory types (PAT, MTRRs, etc.)
Add defines for the architectural memory types that can be shoved into various MSRs and registers, e.g. MTRRs, PAT,
x86/cpu: KVM: Add common defines for architectural memory types (PAT, MTRRs, etc.)
Add defines for the architectural memory types that can be shoved into various MSRs and registers, e.g. MTRRs, PAT, VMX capabilities MSRs, EPTPs, etc. While most MSRs/registers support only a subset of all memory types, the values themselves are architectural and identical across all users.
Leave the goofy MTRR_TYPE_* definitions as-is since they are in a uapi header, but add compile-time assertions to connect the dots (and sanity check that the msr-index.h values didn't get fat-fingered).
Keep the VMX_EPTP_MT_* defines so that it's slightly more obvious that the EPTP holds a single memory type in 3 of its 64 bits; those bits just happen to be 2:0, i.e. don't need to be shifted.
Opportunistically use X86_MEMTYPE_WB instead of an open coded '6' in setup_vmcs_config().
No functional change intended.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20240605231918.2915961-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
321ef62b |
| 07-Jun-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: nVMX: Fold requested virtual interrupt check into has_nested_events()
Check for a Requested Virtual Interrupt, i.e. a virtual interrupt that is pending delivery, in vmx_has_nested_events() and
KVM: nVMX: Fold requested virtual interrupt check into has_nested_events()
Check for a Requested Virtual Interrupt, i.e. a virtual interrupt that is pending delivery, in vmx_has_nested_events() and drop the one-off kvm_x86_ops.guest_apic_has_interrupt() hook.
In addition to dropping a superfluous hook, this fixes a bug where KVM would incorrectly treat virtual interrupts _for L2_ as always enabled due to kvm_arch_interrupt_allowed(), by way of vmx_interrupt_blocked(), treating IRQs as enabled if L2 is active and vmcs12 is configured to exit on IRQs, i.e. KVM would treat a virtual interrupt for L2 as a valid wake event based on L1's IRQ blocking status.
Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240607172609.3205077-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
27c4fa42 |
| 07-Jun-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: nVMX: Check for pending posted interrupts when looking for nested events
Check for pending (and notified!) posted interrupts when checking if L2 has a pending wake event, as fully posted/notifi
KVM: nVMX: Check for pending posted interrupts when looking for nested events
Check for pending (and notified!) posted interrupts when checking if L2 has a pending wake event, as fully posted/notified virtual interrupt is a valid wake event for HLT.
Note that KVM must check vmx->nested.pi_pending to avoid prematurely waking L2, e.g. even if KVM sees a non-zero PID.PIR and PID.0N=1, the virtual interrupt won't actually be recognized until a notification IRQ is received by the vCPU or the vCPU does (nested) VM-Enter.
Fixes: 26844fee6ade ("KVM: x86: never write to memory from kvm_vcpu_check_block()") Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Reported-by: Jim Mattson <jmattson@google.com> Closes: https://lore.kernel.org/all/20231207010302.2240506-1-jmattson@google.com Link: https://lore.kernel.org/r/20240607172609.3205077-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
32f55e47 |
| 07-Jun-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: nVMX: Request immediate exit iff pending nested event needs injection
When requesting an immediate exit from L2 in order to inject a pending event, do so only if the pending event actually requ
KVM: nVMX: Request immediate exit iff pending nested event needs injection
When requesting an immediate exit from L2 in order to inject a pending event, do so only if the pending event actually requires manual injection, i.e. if and only if KVM actually needs to regain control in order to deliver the event.
Avoiding the "immediate exit" isn't simply an optimization, it's necessary to make forward progress, as the "already expired" VMX preemption timer trick that KVM uses to force a VM-Exit has higher priority than events that aren't directly injected.
At present time, this is a glorified nop as all events processed by vmx_has_nested_events() require injection, but that will not hold true in the future, e.g. if there's a pending virtual interrupt in vmcs02.RVI. I.e. if KVM is trying to deliver a virtual interrupt to L2, the expired VMX preemption timer will trigger VM-Exit before the virtual interrupt is delivered, and KVM will effectively hang the vCPU in an endless loop of forced immediate VM-Exits (because the pending virtual interrupt never goes away).
Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240607172609.3205077-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
d83c36d8 |
| 07-Jun-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: nVMX: Add a helper to get highest pending from Posted Interrupt vector
Add a helper to retrieve the highest pending vector given a Posted Interrupt descriptor. While the actual operation is st
KVM: nVMX: Add a helper to get highest pending from Posted Interrupt vector
Add a helper to retrieve the highest pending vector given a Posted Interrupt descriptor. While the actual operation is straightforward, it's surprisingly easy to mess up, e.g. if one tries to reuse lapic.c's find_highest_vector(), which doesn't work with PID.PIR due to the APIC's IRR and ISR component registers being physically discontiguous (they're 4-byte registers aligned at 16-byte intervals).
To make PIR handling more consistent with respect to IRR and ISR handling, return -1 to indicate "no interrupt pending".
Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240607172609.3205077-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
7974c064 |
| 23-Apr-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Add a struct to consolidate host values, e.g. EFER, XCR0, etc...
Add "struct kvm_host_values kvm_host" to hold the various host values that KVM snapshots during initialization. Bundling t
KVM: x86: Add a struct to consolidate host values, e.g. EFER, XCR0, etc...
Add "struct kvm_host_values kvm_host" to hold the various host values that KVM snapshots during initialization. Bundling the host values into a single struct simplifies adding new MSRs and other features with host state/values that KVM cares about, and provides a one-stop shop. E.g. adding a new value requires one line, whereas tracking each value individual often requires three: declaration, definition, and export.
No functional change intended.
Link: https://lore.kernel.org/r/20240423221521.2923759-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
9031b421 |
| 18-May-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: nVMX: Always handle #VEs in L0 (never forward #VEs from L2 to L1)
Always handle #VEs, e.g. due to prove EPT Violation #VE failures, in L0, as KVM does not expose any #VE capabilities to L1, i.e
KVM: nVMX: Always handle #VEs in L0 (never forward #VEs from L2 to L1)
Always handle #VEs, e.g. due to prove EPT Violation #VE failures, in L0, as KVM does not expose any #VE capabilities to L1, i.e. any and all #VEs are KVM's responsibility.
Fixes: 8131cf5b4fd8 ("KVM: VMX: Introduce test mode related to EPT violation VE") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240518000430.1118488-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
d1b32ecd |
| 18-May-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: nVMX: Initialize #VE info page for vmcs02 when proving #VE support
Point vmcs02.VE_INFORMATION_ADDRESS at the vCPU's #VE info page when initializing vmcs02, otherwise KVM will run L2 with EPT V
KVM: nVMX: Initialize #VE info page for vmcs02 when proving #VE support
Point vmcs02.VE_INFORMATION_ADDRESS at the vCPU's #VE info page when initializing vmcs02, otherwise KVM will run L2 with EPT Violation #VE enabled and a VE info address pointing at pfn 0.
Fixes: 8131cf5b4fd8 ("KVM: VMX: Introduce test mode related to EPT violation VE") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240518000430.1118488-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
23ffe4bb |
| 09-Feb-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: nVMX: Add a sanity check that nested PML Full stems from EPT Violations
Add a WARN_ON_ONCE() sanity check to verify that a nested PML Full VM-Exit is only synthesized when the original VM-Exit
KVM: nVMX: Add a sanity check that nested PML Full stems from EPT Violations
Add a WARN_ON_ONCE() sanity check to verify that a nested PML Full VM-Exit is only synthesized when the original VM-Exit from L2 was an EPT Violation. While KVM can fallthrough to kvm_mmu_do_page_fault() if an EPT Misconfig occurs on a stale MMIO SPTE, KVM should not treat the access as a write (there isn't enough information to know *what* the access was), i.e. KVM should never try to insert a PML entry in that case.
Link: https://lore.kernel.org/r/20240209221700.393189-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
a9466078 |
| 09-Feb-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Move nEPT exit_qualification field from kvm_vcpu_arch to x86_exception
Move the exit_qualification field that is used to track information about in-flight nEPT violations from "struct kvm_
KVM: x86: Move nEPT exit_qualification field from kvm_vcpu_arch to x86_exception
Move the exit_qualification field that is used to track information about in-flight nEPT violations from "struct kvm_vcpu_arch" to "x86_exception", i.e. associate the information with the actual nEPT violation instead of the vCPU. To handle bits that are pulled from vmcs.EXIT_QUALIFICATION, i.e. that are propagated from the "original" EPT violation VM-Exit, simply grab them from the VMCS on-demand when injecting a nEPT Violation or a PML Full VM-exit.
Aside from being ugly, having an exit_qualification field in kvm_vcpu_arch is outright dangerous, e.g. see commit d7f0a00e438d ("KVM: VMX: Report up-to-date exit qualification to userspace").
Opportunstically add a comment to call out that PML Full and EPT Violation VM-Exits use the same bit to report NMI blocking information.
Link: https://lore.kernel.org/r/20240209221700.393189-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
0c476514 |
| 09-Feb-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: nVMX: Clear EXIT_QUALIFICATION when injecting an EPT Misconfig
Explicitly clear the EXIT_QUALIFCATION field when injecting an EPT misconfig into L1, as required by the VMX architecture. Per th
KVM: nVMX: Clear EXIT_QUALIFICATION when injecting an EPT Misconfig
Explicitly clear the EXIT_QUALIFCATION field when injecting an EPT misconfig into L1, as required by the VMX architecture. Per the SDM:
This field is saved for VM exits due to the following causes: debug exceptions; page-fault exceptions; start-up IPIs (SIPIs); system-management interrupts (SMIs) that arrive immediately after the execution of I/O instructions; task switches; INVEPT; INVLPG; INVPCID; INVVPID; LGDT; LIDT; LLDT; LTR; SGDT; SIDT; SLDT; STR; VMCLEAR; VMPTRLD; VMPTRST; VMREAD; VMWRITE; VMXON; WBINVD; WBNOINVD; XRSTORS; XSAVES; control-register accesses; MOV DR; I/O instructions; MWAIT; accesses to the APIC-access page; EPT violations; EOI virtualization; APIC-write emulation; page-modification log full; SPP-related events; and instruction timeout. For all other VM exits, this field is cleared.
Generating EXIT_QUALIFICATION from vcpu->arch.exit_qualification is wrong for all (two) paths that lead to nested_ept_inject_page_fault(). For EPT violations (the common case), vcpu->arch.exit_qualification will have been set by handle_ept_violation() to vmcs02.EXIT_QUALIFICATION, i.e. contains the information of a EPT violation and thus is likely non-zero.
For an EPT misconfig, which can reach FNAME(walk_addr_generic) and thus inject a nEPT misconfig if KVM created an MMIO SPTE that became stale, vcpu->arch.exit_qualification will hold the information from the last EPT violation VM-Exit, as vcpu->arch.exit_qualification is _only_ written by handle_ept_violation().
Fixes: 4704d0befb07 ("KVM: nVMX: Exiting from L2 to L1") Link: https://lore.kernel.org/r/20240209221700.393189-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
2a5f091c |
| 09-Feb-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Open code all direct reads to guest DR6 and DR7
Bite the bullet, and open code all direct reads of DR6 and DR7. KVM currently has a mix of open coded accesses and calls to kvm_get_dr(), w
KVM: x86: Open code all direct reads to guest DR6 and DR7
Bite the bullet, and open code all direct reads of DR6 and DR7. KVM currently has a mix of open coded accesses and calls to kvm_get_dr(), which is confusing and ugly because there's no rhyme or reason as to why any particular chunk of code uses kvm_get_dr().
The obvious alternative is to force all accesses through kvm_get_dr(), but it's not at all clear that doing so would be a net positive, e.g. even if KVM ends up wanting/needing to force all reads through a common helper, e.g. to play caching games, the cost of reverting this change is likely lower than the ongoing cost of maintaining weird, arbitrary code.
No functional change intended.
Cc: Mathias Krause <minipli@grsecurity.net> Reviewed-by: Mathias Krause <minipli@grsecurity.net> Link: https://lore.kernel.org/r/20240209220752.388160-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
fc5375dd |
| 09-Feb-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Make kvm_get_dr() return a value, not use an out parameter
Convert kvm_get_dr()'s output parameter to a return value, and clean up most of the mess that was created by forcing callers to p
KVM: x86: Make kvm_get_dr() return a value, not use an out parameter
Convert kvm_get_dr()'s output parameter to a return value, and clean up most of the mess that was created by forcing callers to provide a pointer.
No functional change intended.
Acked-by: Mathias Krause <minipli@grsecurity.net> Reviewed-by: Mathias Krause <minipli@grsecurity.net> Link: https://lore.kernel.org/r/20240209220752.388160-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
f19063b1 |
| 10-Nov-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: x86/pmu: Snapshot event selectors that KVM emulates in software
Snapshot the event selectors for the events that KVM emulates in software, which is currently instructions retired and branch ins
KVM: x86/pmu: Snapshot event selectors that KVM emulates in software
Snapshot the event selectors for the events that KVM emulates in software, which is currently instructions retired and branch instructions retired. The event selectors a tied to the underlying CPU, i.e. are constant for a given platform even though perf doesn't manage the mappings as such.
Getting the event selectors from perf isn't exactly cheap, especially if mitigations are enabled, as at least one indirect call is involved.
Snapshot the values in KVM instead of optimizing perf as working with the raw event selectors will be required if KVM ever wants to emulate events that aren't part of perf's uABI, i.e. that don't have an "enum perf_hw_id" entry.
Link: https://lore.kernel.org/r/20231110022857.1273836-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
54aa699e |
| 03-Jan-2024 |
Bjorn Helgaas <bhelgaas@google.com> |
arch/x86: Fix typos
Fix typos, most reported by "codespell arch/x86". Only touches comments, no code changes.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ingo Molnar <mingo@k
arch/x86: Fix typos
Fix typos, most reported by "codespell arch/x86". Only touches comments, no code changes.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240103004011.1758650-1-helgaas@kernel.org
show more ...
|