1*707df298SLinus Torvalds.. SPDX-License-Identifier: GPL-2.0 2*707df298SLinus Torvalds 3*707df298SLinus Torvalds==================================== 4*707df298SLinus TorvaldsNested KVM on POWER 5*707df298SLinus Torvalds==================================== 6*707df298SLinus Torvalds 7*707df298SLinus TorvaldsIntroduction 8*707df298SLinus Torvalds============ 9*707df298SLinus Torvalds 10*707df298SLinus TorvaldsThis document explains how a guest operating system can act as a 11*707df298SLinus Torvaldshypervisor and run nested guests through the use of hypercalls, if the 12*707df298SLinus Torvaldshypervisor has implemented them. The terms L0, L1, and L2 are used to 13*707df298SLinus Torvaldsrefer to different software entities. L0 is the hypervisor mode entity 14*707df298SLinus Torvaldsthat would normally be called the "host" or "hypervisor". L1 is a 15*707df298SLinus Torvaldsguest virtual machine that is directly run under L0 and is initiated 16*707df298SLinus Torvaldsand controlled by L0. L2 is a guest virtual machine that is initiated 17*707df298SLinus Torvaldsand controlled by L1 acting as a hypervisor. 18*707df298SLinus Torvalds 19*707df298SLinus TorvaldsExisting API 20*707df298SLinus Torvalds============ 21*707df298SLinus Torvalds 22*707df298SLinus TorvaldsLinux/KVM has had support for Nesting as an L0 or L1 since 2018 23*707df298SLinus Torvalds 24*707df298SLinus TorvaldsThe L0 code was added:: 25*707df298SLinus Torvalds 26*707df298SLinus Torvalds commit 8e3f5fc1045dc49fd175b978c5457f5f51e7a2ce 27*707df298SLinus Torvalds Author: Paul Mackerras <paulus@ozlabs.org> 28*707df298SLinus Torvalds Date: Mon Oct 8 16:31:03 2018 +1100 29*707df298SLinus Torvalds KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization 30*707df298SLinus Torvalds 31*707df298SLinus TorvaldsThe L1 code was added:: 32*707df298SLinus Torvalds 33*707df298SLinus Torvalds commit 360cae313702cdd0b90f82c261a8302fecef030a 34*707df298SLinus Torvalds Author: Paul Mackerras <paulus@ozlabs.org> 35*707df298SLinus Torvalds Date: Mon Oct 8 16:31:04 2018 +1100 36*707df298SLinus Torvalds KVM: PPC: Book3S HV: Nested guest entry via hypercall 37*707df298SLinus Torvalds 38*707df298SLinus TorvaldsThis API works primarily using a single hcall h_enter_nested(). This 39*707df298SLinus Torvaldscall made by the L1 to tell the L0 to start an L2 vCPU with the given 40*707df298SLinus Torvaldsstate. The L0 then starts this L2 and runs until an L2 exit condition 41*707df298SLinus Torvaldsis reached. Once the L2 exits, the state of the L2 is given back to 42*707df298SLinus Torvaldsthe L1 by the L0. The full L2 vCPU state is always transferred from 43*707df298SLinus Torvaldsand to L1 when the L2 is run. The L0 doesn't keep any state on the L2 44*707df298SLinus TorvaldsvCPU (except in the short sequence in the L0 on L1 -> L2 entry and L2 45*707df298SLinus Torvalds-> L1 exit). 46*707df298SLinus Torvalds 47*707df298SLinus TorvaldsThe only state kept by the L0 is the partition table. The L1 registers 48*707df298SLinus Torvaldsit's partition table using the h_set_partition_table() hcall. All 49*707df298SLinus Torvaldsother state held by the L0 about the L2s is cached state (such as 50*707df298SLinus Torvaldsshadow page tables). 51*707df298SLinus Torvalds 52*707df298SLinus TorvaldsThe L1 may run any L2 or vCPU without first informing the L0. It 53*707df298SLinus Torvaldssimply starts the vCPU using h_enter_nested(). The creation of L2s and 54*707df298SLinus TorvaldsvCPUs is done implicitly whenever h_enter_nested() is called. 55*707df298SLinus Torvalds 56*707df298SLinus TorvaldsIn this document, we call this existing API the v1 API. 57*707df298SLinus Torvalds 58*707df298SLinus TorvaldsNew PAPR API 59*707df298SLinus Torvalds=============== 60*707df298SLinus Torvalds 61*707df298SLinus TorvaldsThe new PAPR API changes from the v1 API such that the creating L2 and 62*707df298SLinus Torvaldsassociated vCPUs is explicit. In this document, we call this the v2 63*707df298SLinus TorvaldsAPI. 64*707df298SLinus Torvalds 65*707df298SLinus Torvaldsh_enter_nested() is replaced with H_GUEST_VCPU_RUN(). Before this can 66*707df298SLinus Torvaldsbe called the L1 must explicitly create the L2 using h_guest_create() 67*707df298SLinus Torvaldsand any associated vCPUs() created with h_guest_create_vCPU(). Getting 68*707df298SLinus Torvaldsand setting vCPU state can also be performed using h_guest_{g|s}et 69*707df298SLinus Torvaldshcall. 70*707df298SLinus Torvalds 71*707df298SLinus TorvaldsThe basic execution flow is for an L1 to create an L2, run it, and 72*707df298SLinus Torvaldsdelete it is: 73*707df298SLinus Torvalds 74*707df298SLinus Torvalds- L1 and L0 negotiate capabilities with H_GUEST_{G,S}ET_CAPABILITIES() 75*707df298SLinus Torvalds (normally at L1 boot time). 76*707df298SLinus Torvalds 77*707df298SLinus Torvalds- L1 requests the L0 create an L2 with H_GUEST_CREATE() and receives a token 78*707df298SLinus Torvalds 79*707df298SLinus Torvalds- L1 requests the L0 create an L2 vCPU with H_GUEST_CREATE_VCPU() 80*707df298SLinus Torvalds 81*707df298SLinus Torvalds- L1 and L0 communicate the vCPU state using the H_GUEST_{G,S}ET() hcall 82*707df298SLinus Torvalds 83*707df298SLinus Torvalds- L1 requests the L0 runs the vCPU running H_GUEST_VCPU_RUN() hcall 84*707df298SLinus Torvalds 85*707df298SLinus Torvalds- L1 deletes L2 with H_GUEST_DELETE() 86*707df298SLinus Torvalds 87*707df298SLinus TorvaldsMore details of the individual hcalls follows: 88*707df298SLinus Torvalds 89*707df298SLinus TorvaldsHCALL Details 90*707df298SLinus Torvalds============= 91*707df298SLinus Torvalds 92*707df298SLinus TorvaldsThis documentation is provided to give an overall understating of the 93*707df298SLinus TorvaldsAPI. It doesn't aim to provide all the details required to implement 94*707df298SLinus Torvaldsan L1 or L0. Latest version of PAPR can be referred to for more details. 95*707df298SLinus Torvalds 96*707df298SLinus TorvaldsAll these HCALLs are made by the L1 to the L0. 97*707df298SLinus Torvalds 98*707df298SLinus TorvaldsH_GUEST_GET_CAPABILITIES() 99*707df298SLinus Torvalds-------------------------- 100*707df298SLinus Torvalds 101*707df298SLinus TorvaldsThis is called to get the capabilities of the L0 nested 102*707df298SLinus Torvaldshypervisor. This includes capabilities such the CPU versions (eg 103*707df298SLinus TorvaldsPOWER9, POWER10) that are supported as L2s:: 104*707df298SLinus Torvalds 105*707df298SLinus Torvalds H_GUEST_GET_CAPABILITIES(uint64 flags) 106*707df298SLinus Torvalds 107*707df298SLinus Torvalds Parameters: 108*707df298SLinus Torvalds Input: 109*707df298SLinus Torvalds flags: Reserved 110*707df298SLinus Torvalds Output: 111*707df298SLinus Torvalds R3: Return code 112*707df298SLinus Torvalds R4: Hypervisor Supported Capabilities bitmap 1 113*707df298SLinus Torvalds 114*707df298SLinus TorvaldsH_GUEST_SET_CAPABILITIES() 115*707df298SLinus Torvalds-------------------------- 116*707df298SLinus Torvalds 117*707df298SLinus TorvaldsThis is called to inform the L0 of the capabilities of the L1 118*707df298SLinus Torvaldshypervisor. The set of flags passed here are the same as 119*707df298SLinus TorvaldsH_GUEST_GET_CAPABILITIES() 120*707df298SLinus Torvalds 121*707df298SLinus TorvaldsTypically, GET will be called first and then SET will be called with a 122*707df298SLinus Torvaldssubset of the flags returned from GET. This process allows the L0 and 123*707df298SLinus TorvaldsL1 to negotiate an agreed set of capabilities:: 124*707df298SLinus Torvalds 125*707df298SLinus Torvalds H_GUEST_SET_CAPABILITIES(uint64 flags, 126*707df298SLinus Torvalds uint64 capabilitiesBitmap1) 127*707df298SLinus Torvalds Parameters: 128*707df298SLinus Torvalds Input: 129*707df298SLinus Torvalds flags: Reserved 130*707df298SLinus Torvalds capabilitiesBitmap1: Only capabilities advertised through 131*707df298SLinus Torvalds H_GUEST_GET_CAPABILITIES 132*707df298SLinus Torvalds Output: 133*707df298SLinus Torvalds R3: Return code 134*707df298SLinus Torvalds R4: If R3 = H_P2: The number of invalid bitmaps 135*707df298SLinus Torvalds R5: If R3 = H_P2: The index of first invalid bitmap 136*707df298SLinus Torvalds 137*707df298SLinus TorvaldsH_GUEST_CREATE() 138*707df298SLinus Torvalds---------------- 139*707df298SLinus Torvalds 140*707df298SLinus TorvaldsThis is called to create an L2. A unique ID of the L2 created 141*707df298SLinus Torvalds(similar to an LPID) is returned, which can be used on subsequent HCALLs to 142*707df298SLinus Torvaldsidentify the L2:: 143*707df298SLinus Torvalds 144*707df298SLinus Torvalds H_GUEST_CREATE(uint64 flags, 145*707df298SLinus Torvalds uint64 continueToken); 146*707df298SLinus Torvalds Parameters: 147*707df298SLinus Torvalds Input: 148*707df298SLinus Torvalds flags: Reserved 149*707df298SLinus Torvalds continueToken: Initial call set to -1. Subsequent calls, 150*707df298SLinus Torvalds after H_Busy or H_LongBusyOrder has been 151*707df298SLinus Torvalds returned, value that was returned in R4. 152*707df298SLinus Torvalds Output: 153*707df298SLinus Torvalds R3: Return code. Notable: 154*707df298SLinus Torvalds H_Not_Enough_Resources: Unable to create Guest VCPU due to not 155*707df298SLinus Torvalds enough Hypervisor memory. See H_GUEST_CREATE_GET_STATE(flags = 156*707df298SLinus Torvalds takeOwnershipOfVcpuState) 157*707df298SLinus Torvalds R4: If R3 = H_Busy or_H_LongBusyOrder -> continueToken 158*707df298SLinus Torvalds 159*707df298SLinus TorvaldsH_GUEST_CREATE_VCPU() 160*707df298SLinus Torvalds--------------------- 161*707df298SLinus Torvalds 162*707df298SLinus TorvaldsThis is called to create a vCPU associated with an L2. The L2 id 163*707df298SLinus Torvalds(returned from H_GUEST_CREATE()) should be passed it. Also passed in 164*707df298SLinus Torvaldsis a unique (for this L2) vCPUid. This vCPUid is allocated by the 165*707df298SLinus TorvaldsL1:: 166*707df298SLinus Torvalds 167*707df298SLinus Torvalds H_GUEST_CREATE_VCPU(uint64 flags, 168*707df298SLinus Torvalds uint64 guestId, 169*707df298SLinus Torvalds uint64 vcpuId); 170*707df298SLinus Torvalds Parameters: 171*707df298SLinus Torvalds Input: 172*707df298SLinus Torvalds flags: Reserved 173*707df298SLinus Torvalds guestId: ID obtained from H_GUEST_CREATE 174*707df298SLinus Torvalds vcpuId: ID of the vCPU to be created. This must be within the 175*707df298SLinus Torvalds range of 0 to 2047 176*707df298SLinus Torvalds Output: 177*707df298SLinus Torvalds R3: Return code. Notable: 178*707df298SLinus Torvalds H_Not_Enough_Resources: Unable to create Guest VCPU due to not 179*707df298SLinus Torvalds enough Hypervisor memory. See H_GUEST_CREATE_GET_STATE(flags = 180*707df298SLinus Torvalds takeOwnershipOfVcpuState) 181*707df298SLinus Torvalds 182*707df298SLinus TorvaldsH_GUEST_GET_STATE() 183*707df298SLinus Torvalds------------------- 184*707df298SLinus Torvalds 185*707df298SLinus TorvaldsThis is called to get state associated with an L2 (Guest-wide or vCPU specific). 186*707df298SLinus TorvaldsThis info is passed via the Guest State Buffer (GSB), a standard format as 187*707df298SLinus Torvaldsexplained later in this doc, necessary details below: 188*707df298SLinus Torvalds 189*707df298SLinus TorvaldsThis can get either L2 wide or vcpu specific information. Examples of 190*707df298SLinus TorvaldsL2 wide is the timebase offset or process scoped page table 191*707df298SLinus Torvaldsinfo. Examples of vCPU specific are GPRs or VSRs. A bit in the flags 192*707df298SLinus Torvaldsparameter specifies if this call is L2 wide or vCPU specific and the 193*707df298SLinus TorvaldsIDs in the GSB must match this. 194*707df298SLinus Torvalds 195*707df298SLinus TorvaldsThe L1 provides a pointer to the GSB as a parameter to this call. Also 196*707df298SLinus Torvaldsprovided is the L2 and vCPU IDs associated with the state to set. 197*707df298SLinus Torvalds 198*707df298SLinus TorvaldsThe L1 writes only the IDs and sizes in the GSB. L0 writes the 199*707df298SLinus Torvaldsassociated values for each ID in the GSB:: 200*707df298SLinus Torvalds 201*707df298SLinus Torvalds H_GUEST_GET_STATE(uint64 flags, 202*707df298SLinus Torvalds uint64 guestId, 203*707df298SLinus Torvalds uint64 vcpuId, 204*707df298SLinus Torvalds uint64 dataBuffer, 205*707df298SLinus Torvalds uint64 dataBufferSizeInBytes); 206*707df298SLinus Torvalds Parameters: 207*707df298SLinus Torvalds Input: 208*707df298SLinus Torvalds flags: 209*707df298SLinus Torvalds Bit 0: getGuestWideState: Request state of the Guest instead 210*707df298SLinus Torvalds of an individual VCPU. 211*707df298SLinus Torvalds Bit 1: takeOwnershipOfVcpuState Indicate the L1 is taking 212*707df298SLinus Torvalds over ownership of the VCPU state and that the L0 can free 213*707df298SLinus Torvalds the storage holding the state. The VCPU state will need to 214*707df298SLinus Torvalds be returned to the Hypervisor via H_GUEST_SET_STATE prior 215*707df298SLinus Torvalds to H_GUEST_RUN_VCPU being called for this VCPU. The data 216*707df298SLinus Torvalds returned in the dataBuffer is in a Hypervisor internal 217*707df298SLinus Torvalds format. 218*707df298SLinus Torvalds Bits 2-63: Reserved 219*707df298SLinus Torvalds guestId: ID obtained from H_GUEST_CREATE 220*707df298SLinus Torvalds vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU 221*707df298SLinus Torvalds dataBuffer: A L1 real address of the GSB. 222*707df298SLinus Torvalds If takeOwnershipOfVcpuState, size must be at least the size 223*707df298SLinus Torvalds returned by ID=0x0001 224*707df298SLinus Torvalds dataBufferSizeInBytes: Size of dataBuffer 225*707df298SLinus Torvalds Output: 226*707df298SLinus Torvalds R3: Return code 227*707df298SLinus Torvalds R4: If R3 = H_Invalid_Element_Id: The array index of the bad 228*707df298SLinus Torvalds element ID. 229*707df298SLinus Torvalds If R3 = H_Invalid_Element_Size: The array index of the bad 230*707df298SLinus Torvalds element size. 231*707df298SLinus Torvalds If R3 = H_Invalid_Element_Value: The array index of the bad 232*707df298SLinus Torvalds element value. 233*707df298SLinus Torvalds 234*707df298SLinus TorvaldsH_GUEST_SET_STATE() 235*707df298SLinus Torvalds------------------- 236*707df298SLinus Torvalds 237*707df298SLinus TorvaldsThis is called to set L2 wide or vCPU specific L2 state. This info is 238*707df298SLinus Torvaldspassed via the Guest State Buffer (GSB), necessary details below: 239*707df298SLinus Torvalds 240*707df298SLinus TorvaldsThis can set either L2 wide or vcpu specific information. Examples of 241*707df298SLinus TorvaldsL2 wide is the timebase offset or process scoped page table 242*707df298SLinus Torvaldsinfo. Examples of vCPU specific are GPRs or VSRs. A bit in the flags 243*707df298SLinus Torvaldsparameter specifies if this call is L2 wide or vCPU specific and the 244*707df298SLinus TorvaldsIDs in the GSB must match this. 245*707df298SLinus Torvalds 246*707df298SLinus TorvaldsThe L1 provides a pointer to the GSB as a parameter to this call. Also 247*707df298SLinus Torvaldsprovided is the L2 and vCPU IDs associated with the state to set. 248*707df298SLinus Torvalds 249*707df298SLinus TorvaldsThe L1 writes all values in the GSB and the L0 only reads the GSB for 250*707df298SLinus Torvaldsthis call:: 251*707df298SLinus Torvalds 252*707df298SLinus Torvalds H_GUEST_SET_STATE(uint64 flags, 253*707df298SLinus Torvalds uint64 guestId, 254*707df298SLinus Torvalds uint64 vcpuId, 255*707df298SLinus Torvalds uint64 dataBuffer, 256*707df298SLinus Torvalds uint64 dataBufferSizeInBytes); 257*707df298SLinus Torvalds Parameters: 258*707df298SLinus Torvalds Input: 259*707df298SLinus Torvalds flags: 260*707df298SLinus Torvalds Bit 0: getGuestWideState: Request state of the Guest instead 261*707df298SLinus Torvalds of an individual VCPU. 262*707df298SLinus Torvalds Bit 1: returnOwnershipOfVcpuState Return Guest VCPU state. See 263*707df298SLinus Torvalds GET_STATE takeOwnershipOfVcpuState 264*707df298SLinus Torvalds Bits 2-63: Reserved 265*707df298SLinus Torvalds guestId: ID obtained from H_GUEST_CREATE 266*707df298SLinus Torvalds vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU 267*707df298SLinus Torvalds dataBuffer: A L1 real address of the GSB. 268*707df298SLinus Torvalds If takeOwnershipOfVcpuState, size must be at least the size 269*707df298SLinus Torvalds returned by ID=0x0001 270*707df298SLinus Torvalds dataBufferSizeInBytes: Size of dataBuffer 271*707df298SLinus Torvalds Output: 272*707df298SLinus Torvalds R3: Return code 273*707df298SLinus Torvalds R4: If R3 = H_Invalid_Element_Id: The array index of the bad 274*707df298SLinus Torvalds element ID. 275*707df298SLinus Torvalds If R3 = H_Invalid_Element_Size: The array index of the bad 276*707df298SLinus Torvalds element size. 277*707df298SLinus Torvalds If R3 = H_Invalid_Element_Value: The array index of the bad 278*707df298SLinus Torvalds element value. 279*707df298SLinus Torvalds 280*707df298SLinus TorvaldsH_GUEST_RUN_VCPU() 281*707df298SLinus Torvalds------------------ 282*707df298SLinus Torvalds 283*707df298SLinus TorvaldsThis is called to run an L2 vCPU. The L2 and vCPU IDs are passed in as 284*707df298SLinus Torvaldsparameters. The vCPU runs with the state set previously using 285*707df298SLinus TorvaldsH_GUEST_SET_STATE(). When the L2 exits, the L1 will resume from this 286*707df298SLinus Torvaldshcall. 287*707df298SLinus Torvalds 288*707df298SLinus TorvaldsThis hcall also has associated input and output GSBs. Unlike 289*707df298SLinus TorvaldsH_GUEST_{S,G}ET_STATE(), these GSB pointers are not passed in as 290*707df298SLinus Torvaldsparameters to the hcall (This was done in the interest of 291*707df298SLinus Torvaldsperformance). The locations of these GSBs must be preregistered using 292*707df298SLinus Torvaldsthe H_GUEST_SET_STATE() call with ID 0x0c00 and 0x0c01 (see table 293*707df298SLinus Torvaldsbelow). 294*707df298SLinus Torvalds 295*707df298SLinus TorvaldsThe input GSB may contain only VCPU specific elements to be set. This 296*707df298SLinus TorvaldsGSB may also contain zero elements (ie 0 in the first 4 bytes of the 297*707df298SLinus TorvaldsGSB) if nothing needs to be set. 298*707df298SLinus Torvalds 299*707df298SLinus TorvaldsOn exit from the hcall, the output buffer is filled with elements 300*707df298SLinus Torvaldsdetermined by the L0. The reason for the exit is contained in GPR4 (ie 301*707df298SLinus TorvaldsNIP is put in GPR4). The elements returned depend on the exit 302*707df298SLinus Torvaldstype. For example, if the exit reason is the L2 doing a hcall (GPR4 = 303*707df298SLinus Torvalds0xc00), then GPR3-12 are provided in the output GSB as this is the 304*707df298SLinus Torvaldsstate likely needed to service the hcall. If additional state is 305*707df298SLinus Torvaldsneeded, H_GUEST_GET_STATE() may be called by the L1. 306*707df298SLinus Torvalds 307*707df298SLinus TorvaldsTo synthesize interrupts in the L2, when calling H_GUEST_RUN_VCPU() 308*707df298SLinus Torvaldsthe L1 may set a flag (as a hcall parameter) and the L0 will 309*707df298SLinus Torvaldssynthesize the interrupt in the L2. Alternatively, the L1 may 310*707df298SLinus Torvaldssynthesize the interrupt itself using H_GUEST_SET_STATE() or the 311*707df298SLinus TorvaldsH_GUEST_RUN_VCPU() input GSB to set the state appropriately:: 312*707df298SLinus Torvalds 313*707df298SLinus Torvalds H_GUEST_RUN_VCPU(uint64 flags, 314*707df298SLinus Torvalds uint64 guestId, 315*707df298SLinus Torvalds uint64 vcpuId, 316*707df298SLinus Torvalds uint64 dataBuffer, 317*707df298SLinus Torvalds uint64 dataBufferSizeInBytes); 318*707df298SLinus Torvalds Parameters: 319*707df298SLinus Torvalds Input: 320*707df298SLinus Torvalds flags: 321*707df298SLinus Torvalds Bit 0: generateExternalInterrupt: Generate an external interrupt 322*707df298SLinus Torvalds Bit 1: generatePrivilegedDoorbell: Generate a Privileged Doorbell 323*707df298SLinus Torvalds Bit 2: sendToSystemReset”: Generate a System Reset Interrupt 324*707df298SLinus Torvalds Bits 3-63: Reserved 325*707df298SLinus Torvalds guestId: ID obtained from H_GUEST_CREATE 326*707df298SLinus Torvalds vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU 327*707df298SLinus Torvalds Output: 328*707df298SLinus Torvalds R3: Return code 329*707df298SLinus Torvalds R4: If R3 = H_Success: The reason L1 VCPU exited (ie. NIA) 330*707df298SLinus Torvalds 0x000: The VCPU stopped running for an unspecified reason. An 331*707df298SLinus Torvalds example of this is the Hypervisor stopping a VCPU running 332*707df298SLinus Torvalds due to an outstanding interrupt for the Host Partition. 333*707df298SLinus Torvalds 0x980: HDEC 334*707df298SLinus Torvalds 0xC00: HCALL 335*707df298SLinus Torvalds 0xE00: HDSI 336*707df298SLinus Torvalds 0xE20: HISI 337*707df298SLinus Torvalds 0xE40: HEA 338*707df298SLinus Torvalds 0xF80: HV Fac Unavail 339*707df298SLinus Torvalds If R3 = H_Invalid_Element_Id, H_Invalid_Element_Size, or 340*707df298SLinus Torvalds H_Invalid_Element_Value: R4 is offset of the invalid element 341*707df298SLinus Torvalds in the input buffer. 342*707df298SLinus Torvalds 343*707df298SLinus TorvaldsH_GUEST_DELETE() 344*707df298SLinus Torvalds---------------- 345*707df298SLinus Torvalds 346*707df298SLinus TorvaldsThis is called to delete an L2. All associated vCPUs are also 347*707df298SLinus Torvaldsdeleted. No specific vCPU delete call is provided. 348*707df298SLinus Torvalds 349*707df298SLinus TorvaldsA flag may be provided to delete all guests. This is used to reset the 350*707df298SLinus TorvaldsL0 in the case of kdump/kexec:: 351*707df298SLinus Torvalds 352*707df298SLinus Torvalds H_GUEST_DELETE(uint64 flags, 353*707df298SLinus Torvalds uint64 guestId) 354*707df298SLinus Torvalds Parameters: 355*707df298SLinus Torvalds Input: 356*707df298SLinus Torvalds flags: 357*707df298SLinus Torvalds Bit 0: deleteAllGuests: deletes all guests 358*707df298SLinus Torvalds Bits 1-63: Reserved 359*707df298SLinus Torvalds guestId: ID obtained from H_GUEST_CREATE 360*707df298SLinus Torvalds Output: 361*707df298SLinus Torvalds R3: Return code 362*707df298SLinus Torvalds 363*707df298SLinus TorvaldsGuest State Buffer 364*707df298SLinus Torvalds================== 365*707df298SLinus Torvalds 366*707df298SLinus TorvaldsThe Guest State Buffer (GSB) is the main method of communicating state 367*707df298SLinus Torvaldsabout the L2 between the L1 and L0 via H_GUEST_{G,S}ET() and 368*707df298SLinus TorvaldsH_GUEST_VCPU_RUN() calls. 369*707df298SLinus Torvalds 370*707df298SLinus TorvaldsState may be associated with a whole L2 (eg timebase offset) or a 371*707df298SLinus Torvaldsspecific L2 vCPU (eg. GPR state). Only L2 VCPU state maybe be set by 372*707df298SLinus TorvaldsH_GUEST_VCPU_RUN(). 373*707df298SLinus Torvalds 374*707df298SLinus TorvaldsAll data in the GSB is big endian (as is standard in PAPR) 375*707df298SLinus Torvalds 376*707df298SLinus TorvaldsThe Guest state buffer has a header which gives the number of 377*707df298SLinus Torvaldselements, followed by the GSB elements themselves. 378*707df298SLinus Torvalds 379*707df298SLinus TorvaldsGSB header: 380*707df298SLinus Torvalds 381*707df298SLinus Torvalds+----------+----------+-------------------------------------------+ 382*707df298SLinus Torvalds| Offset | Size | Purpose | 383*707df298SLinus Torvalds| Bytes | Bytes | | 384*707df298SLinus Torvalds+==========+==========+===========================================+ 385*707df298SLinus Torvalds| 0 | 4 | Number of elements | 386*707df298SLinus Torvalds+----------+----------+-------------------------------------------+ 387*707df298SLinus Torvalds| 4 | | Guest state buffer elements | 388*707df298SLinus Torvalds+----------+----------+-------------------------------------------+ 389*707df298SLinus Torvalds 390*707df298SLinus TorvaldsGSB element: 391*707df298SLinus Torvalds 392*707df298SLinus Torvalds+----------+----------+-------------------------------------------+ 393*707df298SLinus Torvalds| Offset | Size | Purpose | 394*707df298SLinus Torvalds| Bytes | Bytes | | 395*707df298SLinus Torvalds+==========+==========+===========================================+ 396*707df298SLinus Torvalds| 0 | 2 | ID | 397*707df298SLinus Torvalds+----------+----------+-------------------------------------------+ 398*707df298SLinus Torvalds| 2 | 2 | Size of Value | 399*707df298SLinus Torvalds+----------+----------+-------------------------------------------+ 400*707df298SLinus Torvalds| 4 | As above | Value | 401*707df298SLinus Torvalds+----------+----------+-------------------------------------------+ 402*707df298SLinus Torvalds 403*707df298SLinus TorvaldsThe ID in the GSB element specifies what is to be set. This includes 404*707df298SLinus Torvaldsarchtected state like GPRs, VSRs, SPRs, plus also some meta data about 405*707df298SLinus Torvaldsthe partition like the timebase offset and partition scoped page 406*707df298SLinus Torvaldstable information. 407*707df298SLinus Torvalds 408*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 409*707df298SLinus Torvalds| ID | Size | RW | Thread | Details | 410*707df298SLinus Torvalds| | Bytes | | Guest | | 411*707df298SLinus Torvalds| | | | Scope | | 412*707df298SLinus Torvalds+========+=======+====+========+==================================+ 413*707df298SLinus Torvalds| 0x0000 | | RW | TG | NOP element | 414*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 415*707df298SLinus Torvalds| 0x0001 | 0x08 | R | G | Size of L0 vCPU state. See: | 416*707df298SLinus Torvalds| | | | | H_GUEST_GET_STATE: | 417*707df298SLinus Torvalds| | | | | flags = takeOwnershipOfVcpuState | 418*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 419*707df298SLinus Torvalds| 0x0002 | 0x08 | R | G | Size Run vCPU out buffer | 420*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 421*707df298SLinus Torvalds| 0x0003 | 0x04 | RW | G | Logical PVR | 422*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 423*707df298SLinus Torvalds| 0x0004 | 0x08 | RW | G | TB Offset (L1 relative) | 424*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 425*707df298SLinus Torvalds| 0x0005 | 0x18 | RW | G |Partition scoped page tbl info: | 426*707df298SLinus Torvalds| | | | | | 427*707df298SLinus Torvalds| | | | |- 0x00 Addr part scope table | 428*707df298SLinus Torvalds| | | | |- 0x08 Num addr bits | 429*707df298SLinus Torvalds| | | | |- 0x10 Size root dir | 430*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 431*707df298SLinus Torvalds| 0x0006 | 0x10 | RW | G |Process Table Information: | 432*707df298SLinus Torvalds| | | | | | 433*707df298SLinus Torvalds| | | | |- 0x0 Addr proc scope table | 434*707df298SLinus Torvalds| | | | |- 0x8 Table size. | 435*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 436*707df298SLinus Torvalds| 0x0007-| | | | Reserved | 437*707df298SLinus Torvalds| 0x0BFF | | | | | 438*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 439*707df298SLinus Torvalds| 0x0C00 | 0x10 | RW | T |Run vCPU Input Buffer: | 440*707df298SLinus Torvalds| | | | | | 441*707df298SLinus Torvalds| | | | |- 0x0 Addr of buffer | 442*707df298SLinus Torvalds| | | | |- 0x8 Buffer Size. | 443*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 444*707df298SLinus Torvalds| 0x0C01 | 0x10 | RW | T |Run vCPU Output Buffer: | 445*707df298SLinus Torvalds| | | | | | 446*707df298SLinus Torvalds| | | | |- 0x0 Addr of buffer | 447*707df298SLinus Torvalds| | | | |- 0x8 Buffer Size. | 448*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 449*707df298SLinus Torvalds| 0x0C02 | 0x08 | RW | T | vCPU VPA Address | 450*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 451*707df298SLinus Torvalds| 0x0C03-| | | | Reserved | 452*707df298SLinus Torvalds| 0x0FFF | | | | | 453*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 454*707df298SLinus Torvalds| 0x1000-| 0x08 | RW | T | GPR 0-31 | 455*707df298SLinus Torvalds| 0x101F | | | | | 456*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 457*707df298SLinus Torvalds| 0x1020 | 0x08 | T | T | HDEC expiry TB | 458*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 459*707df298SLinus Torvalds| 0x1021 | 0x08 | RW | T | NIA | 460*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 461*707df298SLinus Torvalds| 0x1022 | 0x08 | RW | T | MSR | 462*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 463*707df298SLinus Torvalds| 0x1023 | 0x08 | RW | T | LR | 464*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 465*707df298SLinus Torvalds| 0x1024 | 0x08 | RW | T | XER | 466*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 467*707df298SLinus Torvalds| 0x1025 | 0x08 | RW | T | CTR | 468*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 469*707df298SLinus Torvalds| 0x1026 | 0x08 | RW | T | CFAR | 470*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 471*707df298SLinus Torvalds| 0x1027 | 0x08 | RW | T | SRR0 | 472*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 473*707df298SLinus Torvalds| 0x1028 | 0x08 | RW | T | SRR1 | 474*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 475*707df298SLinus Torvalds| 0x1029 | 0x08 | RW | T | DAR | 476*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 477*707df298SLinus Torvalds| 0x102A | 0x08 | RW | T | DEC expiry TB | 478*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 479*707df298SLinus Torvalds| 0x102B | 0x08 | RW | T | VTB | 480*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 481*707df298SLinus Torvalds| 0x102C | 0x08 | RW | T | LPCR | 482*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 483*707df298SLinus Torvalds| 0x102D | 0x08 | RW | T | HFSCR | 484*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 485*707df298SLinus Torvalds| 0x102E | 0x08 | RW | T | FSCR | 486*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 487*707df298SLinus Torvalds| 0x102F | 0x08 | RW | T | FPSCR | 488*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 489*707df298SLinus Torvalds| 0x1030 | 0x08 | RW | T | DAWR0 | 490*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 491*707df298SLinus Torvalds| 0x1031 | 0x08 | RW | T | DAWR1 | 492*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 493*707df298SLinus Torvalds| 0x1032 | 0x08 | RW | T | CIABR | 494*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 495*707df298SLinus Torvalds| 0x1033 | 0x08 | RW | T | PURR | 496*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 497*707df298SLinus Torvalds| 0x1034 | 0x08 | RW | T | SPURR | 498*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 499*707df298SLinus Torvalds| 0x1035 | 0x08 | RW | T | IC | 500*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 501*707df298SLinus Torvalds| 0x1036-| 0x08 | RW | T | SPRG 0-3 | 502*707df298SLinus Torvalds| 0x1039 | | | | | 503*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 504*707df298SLinus Torvalds| 0x103A | 0x08 | W | T | PPR | 505*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 506*707df298SLinus Torvalds| 0x103B | 0x08 | RW | T | MMCR 0-3 | 507*707df298SLinus Torvalds| 0x103E | | | | | 508*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 509*707df298SLinus Torvalds| 0x103F | 0x08 | RW | T | MMCRA | 510*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 511*707df298SLinus Torvalds| 0x1040 | 0x08 | RW | T | SIER | 512*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 513*707df298SLinus Torvalds| 0x1041 | 0x08 | RW | T | SIER 2 | 514*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 515*707df298SLinus Torvalds| 0x1042 | 0x08 | RW | T | SIER 3 | 516*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 517*707df298SLinus Torvalds| 0x1043 | 0x08 | RW | T | BESCR | 518*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 519*707df298SLinus Torvalds| 0x1044 | 0x08 | RW | T | EBBHR | 520*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 521*707df298SLinus Torvalds| 0x1045 | 0x08 | RW | T | EBBRR | 522*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 523*707df298SLinus Torvalds| 0x1046 | 0x08 | RW | T | AMR | 524*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 525*707df298SLinus Torvalds| 0x1047 | 0x08 | RW | T | IAMR | 526*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 527*707df298SLinus Torvalds| 0x1048 | 0x08 | RW | T | AMOR | 528*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 529*707df298SLinus Torvalds| 0x1049 | 0x08 | RW | T | UAMOR | 530*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 531*707df298SLinus Torvalds| 0x104A | 0x08 | RW | T | SDAR | 532*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 533*707df298SLinus Torvalds| 0x104B | 0x08 | RW | T | SIAR | 534*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 535*707df298SLinus Torvalds| 0x104C | 0x08 | RW | T | DSCR | 536*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 537*707df298SLinus Torvalds| 0x104D | 0x08 | RW | T | TAR | 538*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 539*707df298SLinus Torvalds| 0x104E | 0x08 | RW | T | DEXCR | 540*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 541*707df298SLinus Torvalds| 0x104F | 0x08 | RW | T | HDEXCR | 542*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 543*707df298SLinus Torvalds| 0x1050 | 0x08 | RW | T | HASHKEYR | 544*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 545*707df298SLinus Torvalds| 0x1051 | 0x08 | RW | T | HASHPKEYR | 546*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 547*707df298SLinus Torvalds| 0x1052 | 0x08 | RW | T | CTRL | 548*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 549*707df298SLinus Torvalds| 0x1053-| | | | Reserved | 550*707df298SLinus Torvalds| 0x1FFF | | | | | 551*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 552*707df298SLinus Torvalds| 0x2000 | 0x04 | RW | T | CR | 553*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 554*707df298SLinus Torvalds| 0x2001 | 0x04 | RW | T | PIDR | 555*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 556*707df298SLinus Torvalds| 0x2002 | 0x04 | RW | T | DSISR | 557*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 558*707df298SLinus Torvalds| 0x2003 | 0x04 | RW | T | VSCR | 559*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 560*707df298SLinus Torvalds| 0x2004 | 0x04 | RW | T | VRSAVE | 561*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 562*707df298SLinus Torvalds| 0x2005 | 0x04 | RW | T | DAWRX0 | 563*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 564*707df298SLinus Torvalds| 0x2006 | 0x04 | RW | T | DAWRX1 | 565*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 566*707df298SLinus Torvalds| 0x2007-| 0x04 | RW | T | PMC 1-6 | 567*707df298SLinus Torvalds| 0x200c | | | | | 568*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 569*707df298SLinus Torvalds| 0x200D | 0x04 | RW | T | WORT | 570*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 571*707df298SLinus Torvalds| 0x200E | 0x04 | RW | T | PSPB | 572*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 573*707df298SLinus Torvalds| 0x200F-| | | | Reserved | 574*707df298SLinus Torvalds| 0x2FFF | | | | | 575*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 576*707df298SLinus Torvalds| 0x3000-| 0x10 | RW | T | VSR 0-63 | 577*707df298SLinus Torvalds| 0x303F | | | | | 578*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 579*707df298SLinus Torvalds| 0x3040-| | | | Reserved | 580*707df298SLinus Torvalds| 0xEFFF | | | | | 581*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 582*707df298SLinus Torvalds| 0xF000 | 0x08 | R | T | HDAR | 583*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 584*707df298SLinus Torvalds| 0xF001 | 0x04 | R | T | HDSISR | 585*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 586*707df298SLinus Torvalds| 0xF002 | 0x04 | R | T | HEIR | 587*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 588*707df298SLinus Torvalds| 0xF003 | 0x08 | R | T | ASDR | 589*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+ 590*707df298SLinus Torvalds 591*707df298SLinus Torvalds 592*707df298SLinus TorvaldsMiscellaneous info 593*707df298SLinus Torvalds================== 594*707df298SLinus Torvalds 595*707df298SLinus TorvaldsState not in ptregs/hvregs 596*707df298SLinus Torvalds-------------------------- 597*707df298SLinus Torvalds 598*707df298SLinus TorvaldsIn the v1 API, some state is not in the ptregs/hvstate. This includes 599*707df298SLinus Torvaldsthe vector register and some SPRs. For the L1 to set this state for 600*707df298SLinus Torvaldsthe L2, the L1 loads up these hardware registers before the 601*707df298SLinus Torvaldsh_enter_nested() call and the L0 ensures they end up as the L2 state 602*707df298SLinus Torvalds(by not touching them). 603*707df298SLinus Torvalds 604*707df298SLinus TorvaldsThe v2 API removes this and explicitly sets this state via the GSB. 605*707df298SLinus Torvalds 606*707df298SLinus TorvaldsL1 Implementation details: Caching state 607*707df298SLinus Torvalds---------------------------------------- 608*707df298SLinus Torvalds 609*707df298SLinus TorvaldsIn the v1 API, all state is sent from the L1 to the L0 and vice versa 610*707df298SLinus Torvaldson every h_enter_nested() hcall. If the L0 is not currently running 611*707df298SLinus Torvaldsany L2s, the L0 has no state information about them. The only 612*707df298SLinus Torvaldsexception to this is the location of the partition table, registered 613*707df298SLinus Torvaldsvia h_set_partition_table(). 614*707df298SLinus Torvalds 615*707df298SLinus TorvaldsThe v2 API changes this so that the L0 retains the L2 state even when 616*707df298SLinus Torvaldsit's vCPUs are no longer running. This means that the L1 only needs to 617*707df298SLinus Torvaldscommunicate with the L0 about L2 state when it needs to modify the L2 618*707df298SLinus Torvaldsstate, or when it's value is out of date. This provides an opportunity 619*707df298SLinus Torvaldsfor performance optimisation. 620*707df298SLinus Torvalds 621*707df298SLinus TorvaldsWhen a vCPU exits from a H_GUEST_RUN_VCPU() call, the L1 internally 622*707df298SLinus Torvaldsmarks all L2 state as invalid. This means that if the L1 wants to know 623*707df298SLinus Torvaldsthe L2 state (say via a kvm_get_one_reg() call), it needs call 624*707df298SLinus TorvaldsH_GUEST_GET_STATE() to get that state. Once it's read, it's marked as 625*707df298SLinus Torvaldsvalid in L1 until the L2 is run again. 626*707df298SLinus Torvalds 627*707df298SLinus TorvaldsAlso, when an L1 modifies L2 vcpu state, it doesn't need to write it 628*707df298SLinus Torvaldsto the L0 until that L2 vcpu runs again. Hence when the L1 updates 629*707df298SLinus Torvaldsstate (say via a kvm_set_one_reg() call), it writes to an internal L1 630*707df298SLinus Torvaldscopy and only flushes this copy to the L0 when the L2 runs again via 631*707df298SLinus Torvaldsthe H_GUEST_VCPU_RUN() input buffer. 632*707df298SLinus Torvalds 633*707df298SLinus TorvaldsThis lazy updating of state by the L1 avoids unnecessary 634*707df298SLinus TorvaldsH_GUEST_{G|S}ET_STATE() calls. 635