xref: /qemu/docs/devel/nested-papr.txt (revision 195bd175)
1195bd175SHarsh Prateek BoraNested PAPR API (aka KVM on PowerVM)
2195bd175SHarsh Prateek Bora====================================
3195bd175SHarsh Prateek Bora
4195bd175SHarsh Prateek BoraThis API aims at providing support to enable nested virtualization with
5195bd175SHarsh Prateek BoraKVM on PowerVM. While the existing support for nested KVM on PowerNV was
6195bd175SHarsh Prateek Boraintroduced with cap-nested-hv option, however, with a slight design change,
7195bd175SHarsh Prateek Borato enable this on papr/pseries, a new cap-nested-papr option is added. eg:
8195bd175SHarsh Prateek Bora
9195bd175SHarsh Prateek Bora  qemu-system-ppc64 -cpu POWER10 -machine pseries,cap-nested-papr=true ...
10195bd175SHarsh Prateek Bora
11195bd175SHarsh Prateek BoraWork by:
12195bd175SHarsh Prateek Bora    Michael Neuling <mikey@neuling.org>
13195bd175SHarsh Prateek Bora    Vaibhav Jain <vaibhav@linux.ibm.com>
14195bd175SHarsh Prateek Bora    Jordan Niethe <jniethe5@gmail.com>
15195bd175SHarsh Prateek Bora    Harsh Prateek Bora <harshpb@linux.ibm.com>
16195bd175SHarsh Prateek Bora    Shivaprasad G Bhat <sbhat@linux.ibm.com>
17195bd175SHarsh Prateek Bora    Kautuk Consul <kconsul@linux.vnet.ibm.com>
18195bd175SHarsh Prateek Bora
19195bd175SHarsh Prateek BoraBelow taken from the kernel documentation:
20195bd175SHarsh Prateek Bora
21195bd175SHarsh Prateek BoraIntroduction
22195bd175SHarsh Prateek Bora============
23195bd175SHarsh Prateek Bora
24195bd175SHarsh Prateek BoraThis document explains how a guest operating system can act as a
25195bd175SHarsh Prateek Borahypervisor and run nested guests through the use of hypercalls, if the
26195bd175SHarsh Prateek Borahypervisor has implemented them. The terms L0, L1, and L2 are used to
27195bd175SHarsh Prateek Borarefer to different software entities. L0 is the hypervisor mode entity
28195bd175SHarsh Prateek Borathat would normally be called the "host" or "hypervisor". L1 is a
29195bd175SHarsh Prateek Boraguest virtual machine that is directly run under L0 and is initiated
30195bd175SHarsh Prateek Boraand controlled by L0. L2 is a guest virtual machine that is initiated
31195bd175SHarsh Prateek Boraand controlled by L1 acting as a hypervisor. A significant design change
32195bd175SHarsh Prateek Borawrt existing API is that now the entire L2 state is maintained within L0.
33195bd175SHarsh Prateek Bora
34195bd175SHarsh Prateek BoraExisting Nested-HV API
35195bd175SHarsh Prateek Bora======================
36195bd175SHarsh Prateek Bora
37195bd175SHarsh Prateek BoraLinux/KVM has had support for Nesting as an L0 or L1 since 2018
38195bd175SHarsh Prateek Bora
39195bd175SHarsh Prateek BoraThe L0 code was added::
40195bd175SHarsh Prateek Bora
41195bd175SHarsh Prateek Bora   commit 8e3f5fc1045dc49fd175b978c5457f5f51e7a2ce
42195bd175SHarsh Prateek Bora   Author: Paul Mackerras <paulus@ozlabs.org>
43195bd175SHarsh Prateek Bora   Date:   Mon Oct 8 16:31:03 2018 +1100
44195bd175SHarsh Prateek Bora   KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization
45195bd175SHarsh Prateek Bora
46195bd175SHarsh Prateek BoraThe L1 code was added::
47195bd175SHarsh Prateek Bora
48195bd175SHarsh Prateek Bora   commit 360cae313702cdd0b90f82c261a8302fecef030a
49195bd175SHarsh Prateek Bora   Author: Paul Mackerras <paulus@ozlabs.org>
50195bd175SHarsh Prateek Bora   Date:   Mon Oct 8 16:31:04 2018 +1100
51195bd175SHarsh Prateek Bora   KVM: PPC: Book3S HV: Nested guest entry via hypercall
52195bd175SHarsh Prateek Bora
53195bd175SHarsh Prateek BoraThis API works primarily using a signal hcall h_enter_nested(). This
54195bd175SHarsh Prateek Boracall made by the L1 to tell the L0 to start an L2 vCPU with the given
55195bd175SHarsh Prateek Borastate. The L0 then starts this L2 and runs until an L2 exit condition
56195bd175SHarsh Prateek Borais reached. Once the L2 exits, the state of the L2 is given back to
57195bd175SHarsh Prateek Borathe L1 by the L0. The full L2 vCPU state is always transferred from
58195bd175SHarsh Prateek Boraand to L1 when the L2 is run. The L0 doesn't keep any state on the L2
59195bd175SHarsh Prateek BoravCPU (except in the short sequence in the L0 on L1 -> L2 entry and L2
60195bd175SHarsh Prateek Bora-> L1 exit).
61195bd175SHarsh Prateek Bora
62195bd175SHarsh Prateek BoraThe only state kept by the L0 is the partition table. The L1 registers
63195bd175SHarsh Prateek Borait's partition table using the h_set_partition_table() hcall. All
64195bd175SHarsh Prateek Boraother state held by the L0 about the L2s is cached state (such as
65195bd175SHarsh Prateek Borashadow page tables).
66195bd175SHarsh Prateek Bora
67195bd175SHarsh Prateek BoraThe L1 may run any L2 or vCPU without first informing the L0. It
68195bd175SHarsh Prateek Borasimply starts the vCPU using h_enter_nested(). The creation of L2s and
69195bd175SHarsh Prateek BoravCPUs is done implicitly whenever h_enter_nested() is called.
70195bd175SHarsh Prateek Bora
71195bd175SHarsh Prateek BoraIn this document, we call this existing API the v1 API.
72195bd175SHarsh Prateek Bora
73195bd175SHarsh Prateek BoraNew PAPR API
74195bd175SHarsh Prateek Bora===============
75195bd175SHarsh Prateek Bora
76195bd175SHarsh Prateek BoraThe new PAPR API changes from the v1 API such that the creating L2 and
77195bd175SHarsh Prateek Boraassociated vCPUs is explicit. In this document, we call this the v2
78195bd175SHarsh Prateek BoraAPI.
79195bd175SHarsh Prateek Bora
80195bd175SHarsh Prateek Borah_enter_nested() is replaced with H_GUEST_VCPU_RUN().  Before this can
81195bd175SHarsh Prateek Borabe called the L1 must explicitly create the L2 using h_guest_create()
82195bd175SHarsh Prateek Boraand any associated vCPUs() created with h_guest_create_vCPU(). Getting
83195bd175SHarsh Prateek Boraand setting vCPU state can also be performed using h_guest_{g|s}et
84195bd175SHarsh Prateek Borahcall.
85195bd175SHarsh Prateek Bora
86195bd175SHarsh Prateek BoraThe basic execution flow is for an L1 to create an L2, run it, and
87195bd175SHarsh Prateek Boradelete it is:
88195bd175SHarsh Prateek Bora
89195bd175SHarsh Prateek Bora- L1 and L0 negotiate capabilities with H_GUEST_{G,S}ET_CAPABILITIES()
90195bd175SHarsh Prateek Bora  (normally at L1 boot time).
91195bd175SHarsh Prateek Bora
92195bd175SHarsh Prateek Bora- L1 requests the L0 to create an L2 with H_GUEST_CREATE() and receives a token
93195bd175SHarsh Prateek Bora
94195bd175SHarsh Prateek Bora- L1 requests the L0 to create an L2 vCPU with H_GUEST_CREATE_VCPU()
95195bd175SHarsh Prateek Bora
96195bd175SHarsh Prateek Bora- L1 and L0 communicate the vCPU state using the H_GUEST_{G,S}ET() hcall
97195bd175SHarsh Prateek Bora
98195bd175SHarsh Prateek Bora- L1 requests the L0 to run the vCPU using H_GUEST_RUN_VCPU() hcall
99195bd175SHarsh Prateek Bora
100195bd175SHarsh Prateek Bora- L1 deletes L2 with H_GUEST_DELETE()
101195bd175SHarsh Prateek Bora
102195bd175SHarsh Prateek BoraFor more details, please refer:
103195bd175SHarsh Prateek Bora
104195bd175SHarsh Prateek Bora[1] Linux Kernel documentation (upstream documentation commit):
105195bd175SHarsh Prateek Bora
106195bd175SHarsh Prateek Boracommit 476652297f94a2e5e5ef29e734b0da37ade94110
107195bd175SHarsh Prateek BoraAuthor: Michael Neuling <mikey@neuling.org>
108195bd175SHarsh Prateek BoraDate:   Thu Sep 14 13:06:00 2023 +1000
109195bd175SHarsh Prateek Bora
110195bd175SHarsh Prateek Bora    docs: powerpc: Document nested KVM on POWER
111195bd175SHarsh Prateek Bora
112195bd175SHarsh Prateek Bora    Document support for nested KVM on POWER using the existing API as well
113195bd175SHarsh Prateek Bora    as the new PAPR API. This includes the new HCALL interface and how it
114195bd175SHarsh Prateek Bora    used by KVM.
115195bd175SHarsh Prateek Bora
116195bd175SHarsh Prateek Bora    Signed-off-by: Michael Neuling <mikey@neuling.org>
117195bd175SHarsh Prateek Bora    Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
118195bd175SHarsh Prateek Bora    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
119195bd175SHarsh Prateek Bora    Link: https://msgid.link/20230914030600.16993-12-jniethe5@gmail.com
120