1*707df298SLinus Torvalds.. SPDX-License-Identifier: GPL-2.0
2*707df298SLinus Torvalds
3*707df298SLinus Torvalds====================================
4*707df298SLinus TorvaldsNested KVM on POWER
5*707df298SLinus Torvalds====================================
6*707df298SLinus Torvalds
7*707df298SLinus TorvaldsIntroduction
8*707df298SLinus Torvalds============
9*707df298SLinus Torvalds
10*707df298SLinus TorvaldsThis document explains how a guest operating system can act as a
11*707df298SLinus Torvaldshypervisor and run nested guests through the use of hypercalls, if the
12*707df298SLinus Torvaldshypervisor has implemented them. The terms L0, L1, and L2 are used to
13*707df298SLinus Torvaldsrefer to different software entities. L0 is the hypervisor mode entity
14*707df298SLinus Torvaldsthat would normally be called the "host" or "hypervisor". L1 is a
15*707df298SLinus Torvaldsguest virtual machine that is directly run under L0 and is initiated
16*707df298SLinus Torvaldsand controlled by L0. L2 is a guest virtual machine that is initiated
17*707df298SLinus Torvaldsand controlled by L1 acting as a hypervisor.
18*707df298SLinus Torvalds
19*707df298SLinus TorvaldsExisting API
20*707df298SLinus Torvalds============
21*707df298SLinus Torvalds
22*707df298SLinus TorvaldsLinux/KVM has had support for Nesting as an L0 or L1 since 2018
23*707df298SLinus Torvalds
24*707df298SLinus TorvaldsThe L0 code was added::
25*707df298SLinus Torvalds
26*707df298SLinus Torvalds   commit 8e3f5fc1045dc49fd175b978c5457f5f51e7a2ce
27*707df298SLinus Torvalds   Author: Paul Mackerras <paulus@ozlabs.org>
28*707df298SLinus Torvalds   Date:   Mon Oct 8 16:31:03 2018 +1100
29*707df298SLinus Torvalds   KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization
30*707df298SLinus Torvalds
31*707df298SLinus TorvaldsThe L1 code was added::
32*707df298SLinus Torvalds
33*707df298SLinus Torvalds   commit 360cae313702cdd0b90f82c261a8302fecef030a
34*707df298SLinus Torvalds   Author: Paul Mackerras <paulus@ozlabs.org>
35*707df298SLinus Torvalds   Date:   Mon Oct 8 16:31:04 2018 +1100
36*707df298SLinus Torvalds   KVM: PPC: Book3S HV: Nested guest entry via hypercall
37*707df298SLinus Torvalds
38*707df298SLinus TorvaldsThis API works primarily using a single hcall h_enter_nested(). This
39*707df298SLinus Torvaldscall made by the L1 to tell the L0 to start an L2 vCPU with the given
40*707df298SLinus Torvaldsstate. The L0 then starts this L2 and runs until an L2 exit condition
41*707df298SLinus Torvaldsis reached. Once the L2 exits, the state of the L2 is given back to
42*707df298SLinus Torvaldsthe L1 by the L0. The full L2 vCPU state is always transferred from
43*707df298SLinus Torvaldsand to L1 when the L2 is run. The L0 doesn't keep any state on the L2
44*707df298SLinus TorvaldsvCPU (except in the short sequence in the L0 on L1 -> L2 entry and L2
45*707df298SLinus Torvalds-> L1 exit).
46*707df298SLinus Torvalds
47*707df298SLinus TorvaldsThe only state kept by the L0 is the partition table. The L1 registers
48*707df298SLinus Torvaldsit's partition table using the h_set_partition_table() hcall. All
49*707df298SLinus Torvaldsother state held by the L0 about the L2s is cached state (such as
50*707df298SLinus Torvaldsshadow page tables).
51*707df298SLinus Torvalds
52*707df298SLinus TorvaldsThe L1 may run any L2 or vCPU without first informing the L0. It
53*707df298SLinus Torvaldssimply starts the vCPU using h_enter_nested(). The creation of L2s and
54*707df298SLinus TorvaldsvCPUs is done implicitly whenever h_enter_nested() is called.
55*707df298SLinus Torvalds
56*707df298SLinus TorvaldsIn this document, we call this existing API the v1 API.
57*707df298SLinus Torvalds
58*707df298SLinus TorvaldsNew PAPR API
59*707df298SLinus Torvalds===============
60*707df298SLinus Torvalds
61*707df298SLinus TorvaldsThe new PAPR API changes from the v1 API such that the creating L2 and
62*707df298SLinus Torvaldsassociated vCPUs is explicit. In this document, we call this the v2
63*707df298SLinus TorvaldsAPI.
64*707df298SLinus Torvalds
65*707df298SLinus Torvaldsh_enter_nested() is replaced with H_GUEST_VCPU_RUN().  Before this can
66*707df298SLinus Torvaldsbe called the L1 must explicitly create the L2 using h_guest_create()
67*707df298SLinus Torvaldsand any associated vCPUs() created with h_guest_create_vCPU(). Getting
68*707df298SLinus Torvaldsand setting vCPU state can also be performed using h_guest_{g|s}et
69*707df298SLinus Torvaldshcall.
70*707df298SLinus Torvalds
71*707df298SLinus TorvaldsThe basic execution flow is for an L1 to create an L2, run it, and
72*707df298SLinus Torvaldsdelete it is:
73*707df298SLinus Torvalds
74*707df298SLinus Torvalds- L1 and L0 negotiate capabilities with H_GUEST_{G,S}ET_CAPABILITIES()
75*707df298SLinus Torvalds  (normally at L1 boot time).
76*707df298SLinus Torvalds
77*707df298SLinus Torvalds- L1 requests the L0 create an L2 with H_GUEST_CREATE() and receives a token
78*707df298SLinus Torvalds
79*707df298SLinus Torvalds- L1 requests the L0 create an L2 vCPU with H_GUEST_CREATE_VCPU()
80*707df298SLinus Torvalds
81*707df298SLinus Torvalds- L1 and L0 communicate the vCPU state using the H_GUEST_{G,S}ET() hcall
82*707df298SLinus Torvalds
83*707df298SLinus Torvalds- L1 requests the L0 runs the vCPU running H_GUEST_VCPU_RUN() hcall
84*707df298SLinus Torvalds
85*707df298SLinus Torvalds- L1 deletes L2 with H_GUEST_DELETE()
86*707df298SLinus Torvalds
87*707df298SLinus TorvaldsMore details of the individual hcalls follows:
88*707df298SLinus Torvalds
89*707df298SLinus TorvaldsHCALL Details
90*707df298SLinus Torvalds=============
91*707df298SLinus Torvalds
92*707df298SLinus TorvaldsThis documentation is provided to give an overall understating of the
93*707df298SLinus TorvaldsAPI. It doesn't aim to provide all the details required to implement
94*707df298SLinus Torvaldsan L1 or L0. Latest version of PAPR can be referred to for more details.
95*707df298SLinus Torvalds
96*707df298SLinus TorvaldsAll these HCALLs are made by the L1 to the L0.
97*707df298SLinus Torvalds
98*707df298SLinus TorvaldsH_GUEST_GET_CAPABILITIES()
99*707df298SLinus Torvalds--------------------------
100*707df298SLinus Torvalds
101*707df298SLinus TorvaldsThis is called to get the capabilities of the L0 nested
102*707df298SLinus Torvaldshypervisor. This includes capabilities such the CPU versions (eg
103*707df298SLinus TorvaldsPOWER9, POWER10) that are supported as L2s::
104*707df298SLinus Torvalds
105*707df298SLinus Torvalds  H_GUEST_GET_CAPABILITIES(uint64 flags)
106*707df298SLinus Torvalds
107*707df298SLinus Torvalds  Parameters:
108*707df298SLinus Torvalds    Input:
109*707df298SLinus Torvalds      flags: Reserved
110*707df298SLinus Torvalds    Output:
111*707df298SLinus Torvalds      R3: Return code
112*707df298SLinus Torvalds      R4: Hypervisor Supported Capabilities bitmap 1
113*707df298SLinus Torvalds
114*707df298SLinus TorvaldsH_GUEST_SET_CAPABILITIES()
115*707df298SLinus Torvalds--------------------------
116*707df298SLinus Torvalds
117*707df298SLinus TorvaldsThis is called to inform the L0 of the capabilities of the L1
118*707df298SLinus Torvaldshypervisor. The set of flags passed here are the same as
119*707df298SLinus TorvaldsH_GUEST_GET_CAPABILITIES()
120*707df298SLinus Torvalds
121*707df298SLinus TorvaldsTypically, GET will be called first and then SET will be called with a
122*707df298SLinus Torvaldssubset of the flags returned from GET. This process allows the L0 and
123*707df298SLinus TorvaldsL1 to negotiate an agreed set of capabilities::
124*707df298SLinus Torvalds
125*707df298SLinus Torvalds  H_GUEST_SET_CAPABILITIES(uint64 flags,
126*707df298SLinus Torvalds                           uint64 capabilitiesBitmap1)
127*707df298SLinus Torvalds  Parameters:
128*707df298SLinus Torvalds    Input:
129*707df298SLinus Torvalds      flags: Reserved
130*707df298SLinus Torvalds      capabilitiesBitmap1: Only capabilities advertised through
131*707df298SLinus Torvalds                           H_GUEST_GET_CAPABILITIES
132*707df298SLinus Torvalds    Output:
133*707df298SLinus Torvalds      R3: Return code
134*707df298SLinus Torvalds      R4: If R3 = H_P2: The number of invalid bitmaps
135*707df298SLinus Torvalds      R5: If R3 = H_P2: The index of first invalid bitmap
136*707df298SLinus Torvalds
137*707df298SLinus TorvaldsH_GUEST_CREATE()
138*707df298SLinus Torvalds----------------
139*707df298SLinus Torvalds
140*707df298SLinus TorvaldsThis is called to create an L2. A unique ID of the L2 created
141*707df298SLinus Torvalds(similar to an LPID) is returned, which can be used on subsequent HCALLs to
142*707df298SLinus Torvaldsidentify the L2::
143*707df298SLinus Torvalds
144*707df298SLinus Torvalds  H_GUEST_CREATE(uint64 flags,
145*707df298SLinus Torvalds                 uint64 continueToken);
146*707df298SLinus Torvalds  Parameters:
147*707df298SLinus Torvalds    Input:
148*707df298SLinus Torvalds      flags: Reserved
149*707df298SLinus Torvalds      continueToken: Initial call set to -1. Subsequent calls,
150*707df298SLinus Torvalds                     after H_Busy or H_LongBusyOrder has been
151*707df298SLinus Torvalds                     returned, value that was returned in R4.
152*707df298SLinus Torvalds    Output:
153*707df298SLinus Torvalds      R3: Return code. Notable:
154*707df298SLinus Torvalds        H_Not_Enough_Resources: Unable to create Guest VCPU due to not
155*707df298SLinus Torvalds        enough Hypervisor memory. See H_GUEST_CREATE_GET_STATE(flags =
156*707df298SLinus Torvalds        takeOwnershipOfVcpuState)
157*707df298SLinus Torvalds      R4: If R3 = H_Busy or_H_LongBusyOrder -> continueToken
158*707df298SLinus Torvalds
159*707df298SLinus TorvaldsH_GUEST_CREATE_VCPU()
160*707df298SLinus Torvalds---------------------
161*707df298SLinus Torvalds
162*707df298SLinus TorvaldsThis is called to create a vCPU associated with an L2. The L2 id
163*707df298SLinus Torvalds(returned from H_GUEST_CREATE()) should be passed it. Also passed in
164*707df298SLinus Torvaldsis a unique (for this L2) vCPUid. This vCPUid is allocated by the
165*707df298SLinus TorvaldsL1::
166*707df298SLinus Torvalds
167*707df298SLinus Torvalds  H_GUEST_CREATE_VCPU(uint64 flags,
168*707df298SLinus Torvalds                      uint64 guestId,
169*707df298SLinus Torvalds                      uint64 vcpuId);
170*707df298SLinus Torvalds  Parameters:
171*707df298SLinus Torvalds    Input:
172*707df298SLinus Torvalds      flags: Reserved
173*707df298SLinus Torvalds      guestId: ID obtained from H_GUEST_CREATE
174*707df298SLinus Torvalds      vcpuId: ID of the vCPU to be created. This must be within the
175*707df298SLinus Torvalds              range of 0 to 2047
176*707df298SLinus Torvalds    Output:
177*707df298SLinus Torvalds      R3: Return code. Notable:
178*707df298SLinus Torvalds        H_Not_Enough_Resources: Unable to create Guest VCPU due to not
179*707df298SLinus Torvalds        enough Hypervisor memory. See H_GUEST_CREATE_GET_STATE(flags =
180*707df298SLinus Torvalds        takeOwnershipOfVcpuState)
181*707df298SLinus Torvalds
182*707df298SLinus TorvaldsH_GUEST_GET_STATE()
183*707df298SLinus Torvalds-------------------
184*707df298SLinus Torvalds
185*707df298SLinus TorvaldsThis is called to get state associated with an L2 (Guest-wide or vCPU specific).
186*707df298SLinus TorvaldsThis info is passed via the Guest State Buffer (GSB), a standard format as
187*707df298SLinus Torvaldsexplained later in this doc, necessary details below:
188*707df298SLinus Torvalds
189*707df298SLinus TorvaldsThis can get either L2 wide or vcpu specific information. Examples of
190*707df298SLinus TorvaldsL2 wide is the timebase offset or process scoped page table
191*707df298SLinus Torvaldsinfo. Examples of vCPU specific are GPRs or VSRs. A bit in the flags
192*707df298SLinus Torvaldsparameter specifies if this call is L2 wide or vCPU specific and the
193*707df298SLinus TorvaldsIDs in the GSB must match this.
194*707df298SLinus Torvalds
195*707df298SLinus TorvaldsThe L1 provides a pointer to the GSB as a parameter to this call. Also
196*707df298SLinus Torvaldsprovided is the L2 and vCPU IDs associated with the state to set.
197*707df298SLinus Torvalds
198*707df298SLinus TorvaldsThe L1 writes only the IDs and sizes in the GSB.  L0 writes the
199*707df298SLinus Torvaldsassociated values for each ID in the GSB::
200*707df298SLinus Torvalds
201*707df298SLinus Torvalds  H_GUEST_GET_STATE(uint64 flags,
202*707df298SLinus Torvalds                           uint64 guestId,
203*707df298SLinus Torvalds                           uint64 vcpuId,
204*707df298SLinus Torvalds                           uint64 dataBuffer,
205*707df298SLinus Torvalds                           uint64 dataBufferSizeInBytes);
206*707df298SLinus Torvalds  Parameters:
207*707df298SLinus Torvalds    Input:
208*707df298SLinus Torvalds      flags:
209*707df298SLinus Torvalds         Bit 0: getGuestWideState: Request state of the Guest instead
210*707df298SLinus Torvalds           of an individual VCPU.
211*707df298SLinus Torvalds         Bit 1: takeOwnershipOfVcpuState Indicate the L1 is taking
212*707df298SLinus Torvalds           over ownership of the VCPU state and that the L0 can free
213*707df298SLinus Torvalds           the storage holding the state. The VCPU state will need to
214*707df298SLinus Torvalds           be returned to the Hypervisor via H_GUEST_SET_STATE prior
215*707df298SLinus Torvalds           to H_GUEST_RUN_VCPU being called for this VCPU. The data
216*707df298SLinus Torvalds           returned in the dataBuffer is in a Hypervisor internal
217*707df298SLinus Torvalds           format.
218*707df298SLinus Torvalds         Bits 2-63: Reserved
219*707df298SLinus Torvalds      guestId: ID obtained from H_GUEST_CREATE
220*707df298SLinus Torvalds      vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU
221*707df298SLinus Torvalds      dataBuffer: A L1 real address of the GSB.
222*707df298SLinus Torvalds        If takeOwnershipOfVcpuState, size must be at least the size
223*707df298SLinus Torvalds        returned by ID=0x0001
224*707df298SLinus Torvalds      dataBufferSizeInBytes: Size of dataBuffer
225*707df298SLinus Torvalds    Output:
226*707df298SLinus Torvalds      R3: Return code
227*707df298SLinus Torvalds      R4: If R3 = H_Invalid_Element_Id: The array index of the bad
228*707df298SLinus Torvalds            element ID.
229*707df298SLinus Torvalds          If R3 = H_Invalid_Element_Size: The array index of the bad
230*707df298SLinus Torvalds             element size.
231*707df298SLinus Torvalds          If R3 = H_Invalid_Element_Value: The array index of the bad
232*707df298SLinus Torvalds             element value.
233*707df298SLinus Torvalds
234*707df298SLinus TorvaldsH_GUEST_SET_STATE()
235*707df298SLinus Torvalds-------------------
236*707df298SLinus Torvalds
237*707df298SLinus TorvaldsThis is called to set L2 wide or vCPU specific L2 state. This info is
238*707df298SLinus Torvaldspassed via the Guest State Buffer (GSB), necessary details below:
239*707df298SLinus Torvalds
240*707df298SLinus TorvaldsThis can set either L2 wide or vcpu specific information. Examples of
241*707df298SLinus TorvaldsL2 wide is the timebase offset or process scoped page table
242*707df298SLinus Torvaldsinfo. Examples of vCPU specific are GPRs or VSRs. A bit in the flags
243*707df298SLinus Torvaldsparameter specifies if this call is L2 wide or vCPU specific and the
244*707df298SLinus TorvaldsIDs in the GSB must match this.
245*707df298SLinus Torvalds
246*707df298SLinus TorvaldsThe L1 provides a pointer to the GSB as a parameter to this call. Also
247*707df298SLinus Torvaldsprovided is the L2 and vCPU IDs associated with the state to set.
248*707df298SLinus Torvalds
249*707df298SLinus TorvaldsThe L1 writes all values in the GSB and the L0 only reads the GSB for
250*707df298SLinus Torvaldsthis call::
251*707df298SLinus Torvalds
252*707df298SLinus Torvalds  H_GUEST_SET_STATE(uint64 flags,
253*707df298SLinus Torvalds                    uint64 guestId,
254*707df298SLinus Torvalds                    uint64 vcpuId,
255*707df298SLinus Torvalds                    uint64 dataBuffer,
256*707df298SLinus Torvalds                    uint64 dataBufferSizeInBytes);
257*707df298SLinus Torvalds  Parameters:
258*707df298SLinus Torvalds    Input:
259*707df298SLinus Torvalds      flags:
260*707df298SLinus Torvalds         Bit 0: getGuestWideState: Request state of the Guest instead
261*707df298SLinus Torvalds           of an individual VCPU.
262*707df298SLinus Torvalds         Bit 1: returnOwnershipOfVcpuState Return Guest VCPU state. See
263*707df298SLinus Torvalds           GET_STATE takeOwnershipOfVcpuState
264*707df298SLinus Torvalds         Bits 2-63: Reserved
265*707df298SLinus Torvalds      guestId: ID obtained from H_GUEST_CREATE
266*707df298SLinus Torvalds      vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU
267*707df298SLinus Torvalds      dataBuffer: A L1 real address of the GSB.
268*707df298SLinus Torvalds        If takeOwnershipOfVcpuState, size must be at least the size
269*707df298SLinus Torvalds        returned by ID=0x0001
270*707df298SLinus Torvalds      dataBufferSizeInBytes: Size of dataBuffer
271*707df298SLinus Torvalds    Output:
272*707df298SLinus Torvalds      R3: Return code
273*707df298SLinus Torvalds      R4: If R3 = H_Invalid_Element_Id: The array index of the bad
274*707df298SLinus Torvalds            element ID.
275*707df298SLinus Torvalds          If R3 = H_Invalid_Element_Size: The array index of the bad
276*707df298SLinus Torvalds             element size.
277*707df298SLinus Torvalds          If R3 = H_Invalid_Element_Value: The array index of the bad
278*707df298SLinus Torvalds             element value.
279*707df298SLinus Torvalds
280*707df298SLinus TorvaldsH_GUEST_RUN_VCPU()
281*707df298SLinus Torvalds------------------
282*707df298SLinus Torvalds
283*707df298SLinus TorvaldsThis is called to run an L2 vCPU. The L2 and vCPU IDs are passed in as
284*707df298SLinus Torvaldsparameters. The vCPU runs with the state set previously using
285*707df298SLinus TorvaldsH_GUEST_SET_STATE(). When the L2 exits, the L1 will resume from this
286*707df298SLinus Torvaldshcall.
287*707df298SLinus Torvalds
288*707df298SLinus TorvaldsThis hcall also has associated input and output GSBs. Unlike
289*707df298SLinus TorvaldsH_GUEST_{S,G}ET_STATE(), these GSB pointers are not passed in as
290*707df298SLinus Torvaldsparameters to the hcall (This was done in the interest of
291*707df298SLinus Torvaldsperformance). The locations of these GSBs must be preregistered using
292*707df298SLinus Torvaldsthe H_GUEST_SET_STATE() call with ID 0x0c00 and 0x0c01 (see table
293*707df298SLinus Torvaldsbelow).
294*707df298SLinus Torvalds
295*707df298SLinus TorvaldsThe input GSB may contain only VCPU specific elements to be set. This
296*707df298SLinus TorvaldsGSB may also contain zero elements (ie 0 in the first 4 bytes of the
297*707df298SLinus TorvaldsGSB) if nothing needs to be set.
298*707df298SLinus Torvalds
299*707df298SLinus TorvaldsOn exit from the hcall, the output buffer is filled with elements
300*707df298SLinus Torvaldsdetermined by the L0. The reason for the exit is contained in GPR4 (ie
301*707df298SLinus TorvaldsNIP is put in GPR4).  The elements returned depend on the exit
302*707df298SLinus Torvaldstype. For example, if the exit reason is the L2 doing a hcall (GPR4 =
303*707df298SLinus Torvalds0xc00), then GPR3-12 are provided in the output GSB as this is the
304*707df298SLinus Torvaldsstate likely needed to service the hcall. If additional state is
305*707df298SLinus Torvaldsneeded, H_GUEST_GET_STATE() may be called by the L1.
306*707df298SLinus Torvalds
307*707df298SLinus TorvaldsTo synthesize interrupts in the L2, when calling H_GUEST_RUN_VCPU()
308*707df298SLinus Torvaldsthe L1 may set a flag (as a hcall parameter) and the L0 will
309*707df298SLinus Torvaldssynthesize the interrupt in the L2. Alternatively, the L1 may
310*707df298SLinus Torvaldssynthesize the interrupt itself using H_GUEST_SET_STATE() or the
311*707df298SLinus TorvaldsH_GUEST_RUN_VCPU() input GSB to set the state appropriately::
312*707df298SLinus Torvalds
313*707df298SLinus Torvalds  H_GUEST_RUN_VCPU(uint64 flags,
314*707df298SLinus Torvalds                   uint64 guestId,
315*707df298SLinus Torvalds                   uint64 vcpuId,
316*707df298SLinus Torvalds                   uint64 dataBuffer,
317*707df298SLinus Torvalds                   uint64 dataBufferSizeInBytes);
318*707df298SLinus Torvalds  Parameters:
319*707df298SLinus Torvalds    Input:
320*707df298SLinus Torvalds      flags:
321*707df298SLinus Torvalds         Bit 0: generateExternalInterrupt: Generate an external interrupt
322*707df298SLinus Torvalds         Bit 1: generatePrivilegedDoorbell: Generate a Privileged Doorbell
323*707df298SLinus Torvalds         Bit 2: sendToSystemReset”: Generate a System Reset Interrupt
324*707df298SLinus Torvalds         Bits 3-63: Reserved
325*707df298SLinus Torvalds      guestId: ID obtained from H_GUEST_CREATE
326*707df298SLinus Torvalds      vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU
327*707df298SLinus Torvalds    Output:
328*707df298SLinus Torvalds      R3: Return code
329*707df298SLinus Torvalds      R4: If R3 = H_Success: The reason L1 VCPU exited (ie. NIA)
330*707df298SLinus Torvalds            0x000: The VCPU stopped running for an unspecified reason. An
331*707df298SLinus Torvalds              example of this is the Hypervisor stopping a VCPU running
332*707df298SLinus Torvalds              due to an outstanding interrupt for the Host Partition.
333*707df298SLinus Torvalds            0x980: HDEC
334*707df298SLinus Torvalds            0xC00: HCALL
335*707df298SLinus Torvalds            0xE00: HDSI
336*707df298SLinus Torvalds            0xE20: HISI
337*707df298SLinus Torvalds            0xE40: HEA
338*707df298SLinus Torvalds            0xF80: HV Fac Unavail
339*707df298SLinus Torvalds          If R3 = H_Invalid_Element_Id, H_Invalid_Element_Size, or
340*707df298SLinus Torvalds            H_Invalid_Element_Value: R4 is offset of the invalid element
341*707df298SLinus Torvalds            in the input buffer.
342*707df298SLinus Torvalds
343*707df298SLinus TorvaldsH_GUEST_DELETE()
344*707df298SLinus Torvalds----------------
345*707df298SLinus Torvalds
346*707df298SLinus TorvaldsThis is called to delete an L2. All associated vCPUs are also
347*707df298SLinus Torvaldsdeleted. No specific vCPU delete call is provided.
348*707df298SLinus Torvalds
349*707df298SLinus TorvaldsA flag may be provided to delete all guests. This is used to reset the
350*707df298SLinus TorvaldsL0 in the case of kdump/kexec::
351*707df298SLinus Torvalds
352*707df298SLinus Torvalds  H_GUEST_DELETE(uint64 flags,
353*707df298SLinus Torvalds                 uint64 guestId)
354*707df298SLinus Torvalds  Parameters:
355*707df298SLinus Torvalds    Input:
356*707df298SLinus Torvalds      flags:
357*707df298SLinus Torvalds         Bit 0: deleteAllGuests: deletes all guests
358*707df298SLinus Torvalds         Bits 1-63: Reserved
359*707df298SLinus Torvalds      guestId: ID obtained from H_GUEST_CREATE
360*707df298SLinus Torvalds    Output:
361*707df298SLinus Torvalds      R3: Return code
362*707df298SLinus Torvalds
363*707df298SLinus TorvaldsGuest State Buffer
364*707df298SLinus Torvalds==================
365*707df298SLinus Torvalds
366*707df298SLinus TorvaldsThe Guest State Buffer (GSB) is the main method of communicating state
367*707df298SLinus Torvaldsabout the L2 between the L1 and L0 via H_GUEST_{G,S}ET() and
368*707df298SLinus TorvaldsH_GUEST_VCPU_RUN() calls.
369*707df298SLinus Torvalds
370*707df298SLinus TorvaldsState may be associated with a whole L2 (eg timebase offset) or a
371*707df298SLinus Torvaldsspecific L2 vCPU (eg. GPR state). Only L2 VCPU state maybe be set by
372*707df298SLinus TorvaldsH_GUEST_VCPU_RUN().
373*707df298SLinus Torvalds
374*707df298SLinus TorvaldsAll data in the GSB is big endian (as is standard in PAPR)
375*707df298SLinus Torvalds
376*707df298SLinus TorvaldsThe Guest state buffer has a header which gives the number of
377*707df298SLinus Torvaldselements, followed by the GSB elements themselves.
378*707df298SLinus Torvalds
379*707df298SLinus TorvaldsGSB header:
380*707df298SLinus Torvalds
381*707df298SLinus Torvalds+----------+----------+-------------------------------------------+
382*707df298SLinus Torvalds|  Offset  |  Size    |  Purpose                                  |
383*707df298SLinus Torvalds|  Bytes   |  Bytes   |                                           |
384*707df298SLinus Torvalds+==========+==========+===========================================+
385*707df298SLinus Torvalds|    0     |    4     |  Number of elements                       |
386*707df298SLinus Torvalds+----------+----------+-------------------------------------------+
387*707df298SLinus Torvalds|    4     |          |  Guest state buffer elements              |
388*707df298SLinus Torvalds+----------+----------+-------------------------------------------+
389*707df298SLinus Torvalds
390*707df298SLinus TorvaldsGSB element:
391*707df298SLinus Torvalds
392*707df298SLinus Torvalds+----------+----------+-------------------------------------------+
393*707df298SLinus Torvalds|  Offset  |  Size    |  Purpose                                  |
394*707df298SLinus Torvalds|  Bytes   |  Bytes   |                                           |
395*707df298SLinus Torvalds+==========+==========+===========================================+
396*707df298SLinus Torvalds|    0     |    2     |  ID                                       |
397*707df298SLinus Torvalds+----------+----------+-------------------------------------------+
398*707df298SLinus Torvalds|    2     |    2     |  Size of Value                            |
399*707df298SLinus Torvalds+----------+----------+-------------------------------------------+
400*707df298SLinus Torvalds|    4     | As above |  Value                                    |
401*707df298SLinus Torvalds+----------+----------+-------------------------------------------+
402*707df298SLinus Torvalds
403*707df298SLinus TorvaldsThe ID in the GSB element specifies what is to be set. This includes
404*707df298SLinus Torvaldsarchtected state like GPRs, VSRs, SPRs, plus also some meta data about
405*707df298SLinus Torvaldsthe partition like the timebase offset and partition scoped page
406*707df298SLinus Torvaldstable information.
407*707df298SLinus Torvalds
408*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
409*707df298SLinus Torvalds|   ID   | Size  | RW | Thread | Details                          |
410*707df298SLinus Torvalds|        | Bytes |    | Guest  |                                  |
411*707df298SLinus Torvalds|        |       |    | Scope  |                                  |
412*707df298SLinus Torvalds+========+=======+====+========+==================================+
413*707df298SLinus Torvalds| 0x0000 |       | RW |   TG   | NOP element                      |
414*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
415*707df298SLinus Torvalds| 0x0001 | 0x08  | R  |   G    | Size of L0 vCPU state. See:      |
416*707df298SLinus Torvalds|        |       |    |        | H_GUEST_GET_STATE:               |
417*707df298SLinus Torvalds|        |       |    |        | flags = takeOwnershipOfVcpuState |
418*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
419*707df298SLinus Torvalds| 0x0002 | 0x08  | R  |   G    | Size Run vCPU out buffer         |
420*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
421*707df298SLinus Torvalds| 0x0003 | 0x04  | RW |   G    | Logical PVR                      |
422*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
423*707df298SLinus Torvalds| 0x0004 | 0x08  | RW |   G    | TB Offset (L1 relative)          |
424*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
425*707df298SLinus Torvalds| 0x0005 | 0x18  | RW |   G    |Partition scoped page tbl info:   |
426*707df298SLinus Torvalds|        |       |    |        |                                  |
427*707df298SLinus Torvalds|        |       |    |        |- 0x00 Addr part scope table      |
428*707df298SLinus Torvalds|        |       |    |        |- 0x08 Num addr bits              |
429*707df298SLinus Torvalds|        |       |    |        |- 0x10 Size root dir              |
430*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
431*707df298SLinus Torvalds| 0x0006 | 0x10  | RW |   G    |Process Table Information:        |
432*707df298SLinus Torvalds|        |       |    |        |                                  |
433*707df298SLinus Torvalds|        |       |    |        |- 0x0 Addr proc scope table       |
434*707df298SLinus Torvalds|        |       |    |        |- 0x8 Table size.                 |
435*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
436*707df298SLinus Torvalds| 0x0007-|       |    |        | Reserved                         |
437*707df298SLinus Torvalds| 0x0BFF |       |    |        |                                  |
438*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
439*707df298SLinus Torvalds| 0x0C00 | 0x10  | RW |   T    |Run vCPU Input Buffer:            |
440*707df298SLinus Torvalds|        |       |    |        |                                  |
441*707df298SLinus Torvalds|        |       |    |        |- 0x0 Addr of buffer              |
442*707df298SLinus Torvalds|        |       |    |        |- 0x8 Buffer Size.                |
443*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
444*707df298SLinus Torvalds| 0x0C01 | 0x10  | RW |   T    |Run vCPU Output Buffer:           |
445*707df298SLinus Torvalds|        |       |    |        |                                  |
446*707df298SLinus Torvalds|        |       |    |        |- 0x0 Addr of buffer              |
447*707df298SLinus Torvalds|        |       |    |        |- 0x8 Buffer Size.                |
448*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
449*707df298SLinus Torvalds| 0x0C02 | 0x08  | RW |   T    | vCPU VPA Address                 |
450*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
451*707df298SLinus Torvalds| 0x0C03-|       |    |        | Reserved                         |
452*707df298SLinus Torvalds| 0x0FFF |       |    |        |                                  |
453*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
454*707df298SLinus Torvalds| 0x1000-| 0x08  | RW |   T    | GPR 0-31                         |
455*707df298SLinus Torvalds| 0x101F |       |    |        |                                  |
456*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
457*707df298SLinus Torvalds| 0x1020 |  0x08 | T  |   T    | HDEC expiry TB                   |
458*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
459*707df298SLinus Torvalds| 0x1021 | 0x08  | RW |   T    | NIA                              |
460*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
461*707df298SLinus Torvalds| 0x1022 | 0x08  | RW |   T    | MSR                              |
462*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
463*707df298SLinus Torvalds| 0x1023 | 0x08  | RW |   T    | LR                               |
464*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
465*707df298SLinus Torvalds| 0x1024 | 0x08  | RW |   T    | XER                              |
466*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
467*707df298SLinus Torvalds| 0x1025 | 0x08  | RW |   T    | CTR                              |
468*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
469*707df298SLinus Torvalds| 0x1026 | 0x08  | RW |   T    | CFAR                             |
470*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
471*707df298SLinus Torvalds| 0x1027 | 0x08  | RW |   T    | SRR0                             |
472*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
473*707df298SLinus Torvalds| 0x1028 | 0x08  | RW |   T    | SRR1                             |
474*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
475*707df298SLinus Torvalds| 0x1029 | 0x08  | RW |   T    | DAR                              |
476*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
477*707df298SLinus Torvalds| 0x102A | 0x08  | RW |   T    | DEC expiry TB                    |
478*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
479*707df298SLinus Torvalds| 0x102B | 0x08  | RW |   T    | VTB                              |
480*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
481*707df298SLinus Torvalds| 0x102C | 0x08  | RW |   T    | LPCR                             |
482*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
483*707df298SLinus Torvalds| 0x102D | 0x08  | RW |   T    | HFSCR                            |
484*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
485*707df298SLinus Torvalds| 0x102E | 0x08  | RW |   T    | FSCR                             |
486*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
487*707df298SLinus Torvalds| 0x102F | 0x08  | RW |   T    | FPSCR                            |
488*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
489*707df298SLinus Torvalds| 0x1030 | 0x08  | RW |   T    | DAWR0                            |
490*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
491*707df298SLinus Torvalds| 0x1031 | 0x08  | RW |   T    | DAWR1                            |
492*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
493*707df298SLinus Torvalds| 0x1032 | 0x08  | RW |   T    | CIABR                            |
494*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
495*707df298SLinus Torvalds| 0x1033 | 0x08  | RW |   T    | PURR                             |
496*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
497*707df298SLinus Torvalds| 0x1034 | 0x08  | RW |   T    | SPURR                            |
498*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
499*707df298SLinus Torvalds| 0x1035 | 0x08  | RW |   T    | IC                               |
500*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
501*707df298SLinus Torvalds| 0x1036-| 0x08  | RW |   T    | SPRG 0-3                         |
502*707df298SLinus Torvalds| 0x1039 |       |    |        |                                  |
503*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
504*707df298SLinus Torvalds| 0x103A | 0x08  | W  |   T    | PPR                              |
505*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
506*707df298SLinus Torvalds| 0x103B | 0x08  | RW |   T    | MMCR 0-3                         |
507*707df298SLinus Torvalds| 0x103E |       |    |        |                                  |
508*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
509*707df298SLinus Torvalds| 0x103F | 0x08  | RW |   T    | MMCRA                            |
510*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
511*707df298SLinus Torvalds| 0x1040 | 0x08  | RW |   T    | SIER                             |
512*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
513*707df298SLinus Torvalds| 0x1041 | 0x08  | RW |   T    | SIER 2                           |
514*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
515*707df298SLinus Torvalds| 0x1042 | 0x08  | RW |   T    | SIER 3                           |
516*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
517*707df298SLinus Torvalds| 0x1043 | 0x08  | RW |   T    | BESCR                            |
518*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
519*707df298SLinus Torvalds| 0x1044 | 0x08  | RW |   T    | EBBHR                            |
520*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
521*707df298SLinus Torvalds| 0x1045 | 0x08  | RW |   T    | EBBRR                            |
522*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
523*707df298SLinus Torvalds| 0x1046 | 0x08  | RW |   T    | AMR                              |
524*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
525*707df298SLinus Torvalds| 0x1047 | 0x08  | RW |   T    | IAMR                             |
526*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
527*707df298SLinus Torvalds| 0x1048 | 0x08  | RW |   T    | AMOR                             |
528*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
529*707df298SLinus Torvalds| 0x1049 | 0x08  | RW |   T    | UAMOR                            |
530*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
531*707df298SLinus Torvalds| 0x104A | 0x08  | RW |   T    | SDAR                             |
532*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
533*707df298SLinus Torvalds| 0x104B | 0x08  | RW |   T    | SIAR                             |
534*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
535*707df298SLinus Torvalds| 0x104C | 0x08  | RW |   T    | DSCR                             |
536*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
537*707df298SLinus Torvalds| 0x104D | 0x08  | RW |   T    | TAR                              |
538*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
539*707df298SLinus Torvalds| 0x104E | 0x08  | RW |   T    | DEXCR                            |
540*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
541*707df298SLinus Torvalds| 0x104F | 0x08  | RW |   T    | HDEXCR                           |
542*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
543*707df298SLinus Torvalds| 0x1050 | 0x08  | RW |   T    | HASHKEYR                         |
544*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
545*707df298SLinus Torvalds| 0x1051 | 0x08  | RW |   T    | HASHPKEYR                        |
546*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
547*707df298SLinus Torvalds| 0x1052 | 0x08  | RW |   T    | CTRL                             |
548*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
549*707df298SLinus Torvalds| 0x1053-|       |    |        | Reserved                         |
550*707df298SLinus Torvalds| 0x1FFF |       |    |        |                                  |
551*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
552*707df298SLinus Torvalds| 0x2000 | 0x04  | RW |   T    | CR                               |
553*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
554*707df298SLinus Torvalds| 0x2001 | 0x04  | RW |   T    | PIDR                             |
555*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
556*707df298SLinus Torvalds| 0x2002 | 0x04  | RW |   T    | DSISR                            |
557*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
558*707df298SLinus Torvalds| 0x2003 | 0x04  | RW |   T    | VSCR                             |
559*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
560*707df298SLinus Torvalds| 0x2004 | 0x04  | RW |   T    | VRSAVE                           |
561*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
562*707df298SLinus Torvalds| 0x2005 | 0x04  | RW |   T    | DAWRX0                           |
563*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
564*707df298SLinus Torvalds| 0x2006 | 0x04  | RW |   T    | DAWRX1                           |
565*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
566*707df298SLinus Torvalds| 0x2007-| 0x04  | RW |   T    | PMC 1-6                          |
567*707df298SLinus Torvalds| 0x200c |       |    |        |                                  |
568*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
569*707df298SLinus Torvalds| 0x200D | 0x04  | RW |   T    | WORT                             |
570*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
571*707df298SLinus Torvalds| 0x200E | 0x04  | RW |   T    | PSPB                             |
572*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
573*707df298SLinus Torvalds| 0x200F-|       |    |        | Reserved                         |
574*707df298SLinus Torvalds| 0x2FFF |       |    |        |                                  |
575*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
576*707df298SLinus Torvalds| 0x3000-| 0x10  | RW |   T    | VSR 0-63                         |
577*707df298SLinus Torvalds| 0x303F |       |    |        |                                  |
578*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
579*707df298SLinus Torvalds| 0x3040-|       |    |        | Reserved                         |
580*707df298SLinus Torvalds| 0xEFFF |       |    |        |                                  |
581*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
582*707df298SLinus Torvalds| 0xF000 | 0x08  | R  |   T    | HDAR                             |
583*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
584*707df298SLinus Torvalds| 0xF001 | 0x04  | R  |   T    | HDSISR                           |
585*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
586*707df298SLinus Torvalds| 0xF002 | 0x04  | R  |   T    | HEIR                             |
587*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
588*707df298SLinus Torvalds| 0xF003 | 0x08  | R  |   T    | ASDR                             |
589*707df298SLinus Torvalds+--------+-------+----+--------+----------------------------------+
590*707df298SLinus Torvalds
591*707df298SLinus Torvalds
592*707df298SLinus TorvaldsMiscellaneous info
593*707df298SLinus Torvalds==================
594*707df298SLinus Torvalds
595*707df298SLinus TorvaldsState not in ptregs/hvregs
596*707df298SLinus Torvalds--------------------------
597*707df298SLinus Torvalds
598*707df298SLinus TorvaldsIn the v1 API, some state is not in the ptregs/hvstate. This includes
599*707df298SLinus Torvaldsthe vector register and some SPRs. For the L1 to set this state for
600*707df298SLinus Torvaldsthe L2, the L1 loads up these hardware registers before the
601*707df298SLinus Torvaldsh_enter_nested() call and the L0 ensures they end up as the L2 state
602*707df298SLinus Torvalds(by not touching them).
603*707df298SLinus Torvalds
604*707df298SLinus TorvaldsThe v2 API removes this and explicitly sets this state via the GSB.
605*707df298SLinus Torvalds
606*707df298SLinus TorvaldsL1 Implementation details: Caching state
607*707df298SLinus Torvalds----------------------------------------
608*707df298SLinus Torvalds
609*707df298SLinus TorvaldsIn the v1 API, all state is sent from the L1 to the L0 and vice versa
610*707df298SLinus Torvaldson every h_enter_nested() hcall. If the L0 is not currently running
611*707df298SLinus Torvaldsany L2s, the L0 has no state information about them. The only
612*707df298SLinus Torvaldsexception to this is the location of the partition table, registered
613*707df298SLinus Torvaldsvia h_set_partition_table().
614*707df298SLinus Torvalds
615*707df298SLinus TorvaldsThe v2 API changes this so that the L0 retains the L2 state even when
616*707df298SLinus Torvaldsit's vCPUs are no longer running. This means that the L1 only needs to
617*707df298SLinus Torvaldscommunicate with the L0 about L2 state when it needs to modify the L2
618*707df298SLinus Torvaldsstate, or when it's value is out of date. This provides an opportunity
619*707df298SLinus Torvaldsfor performance optimisation.
620*707df298SLinus Torvalds
621*707df298SLinus TorvaldsWhen a vCPU exits from a H_GUEST_RUN_VCPU() call, the L1 internally
622*707df298SLinus Torvaldsmarks all L2 state as invalid. This means that if the L1 wants to know
623*707df298SLinus Torvaldsthe L2 state (say via a kvm_get_one_reg() call), it needs call
624*707df298SLinus TorvaldsH_GUEST_GET_STATE() to get that state. Once it's read, it's marked as
625*707df298SLinus Torvaldsvalid in L1 until the L2 is run again.
626*707df298SLinus Torvalds
627*707df298SLinus TorvaldsAlso, when an L1 modifies L2 vcpu state, it doesn't need to write it
628*707df298SLinus Torvaldsto the L0 until that L2 vcpu runs again. Hence when the L1 updates
629*707df298SLinus Torvaldsstate (say via a kvm_set_one_reg() call), it writes to an internal L1
630*707df298SLinus Torvaldscopy and only flushes this copy to the L0 when the L2 runs again via
631*707df298SLinus Torvaldsthe H_GUEST_VCPU_RUN() input buffer.
632*707df298SLinus Torvalds
633*707df298SLinus TorvaldsThis lazy updating of state by the L1 avoids unnecessary
634*707df298SLinus TorvaldsH_GUEST_{G|S}ET_STATE() calls.
635