1a23a72ddSLeonardo Garcia=================================== 2a4ee352fSCédric Le GoaterpSeries family boards (``pseries``) 3a4ee352fSCédric Le Goater=================================== 4a4ee352fSCédric Le Goater 5a23a72ddSLeonardo GarciaThe Power machine para-virtualized environment described by the Linux on Power 6a23a72ddSLeonardo GarciaArchitecture Reference ([LoPAR]_) document is called pSeries. This environment 7a23a72ddSLeonardo Garciais also known as sPAPR, System p guests, or simply Power Linux guests (although 8a23a72ddSLeonardo Garciait is capable of running other operating systems, such as AIX). 988581cc4SLeonardo Garcia 1088581cc4SLeonardo GarciaEven though pSeries is designed to behave as a guest environment, it is also 1188581cc4SLeonardo Garciacapable of acting as a hypervisor OS, providing, on that role, nested 1288581cc4SLeonardo Garciavirtualization capabilities. 1388581cc4SLeonardo Garcia 14a4ee352fSCédric Le GoaterSupported devices 15a23a72ddSLeonardo Garcia================= 16a4ee352fSCédric Le Goater 1788581cc4SLeonardo Garcia * Multi processor support for many Power processors generations: POWER7, 1888581cc4SLeonardo Garcia POWER7+, POWER8, POWER8NVL, POWER9, and Power10. Support for POWER5+ exists, 1988581cc4SLeonardo Garcia but its state is unknown. 2088581cc4SLeonardo Garcia * Interrupt Controller, XICS (POWER8) and XIVE (POWER9 and Power10) 2188581cc4SLeonardo Garcia * vPHB PCIe Host bridge. 2288581cc4SLeonardo Garcia * vscsi and vnet devices, compatible with the same devices available on a 2388581cc4SLeonardo Garcia PowerVM hypervisor with VIOS managing LPARs. 2488581cc4SLeonardo Garcia * Virtio based devices. 2588581cc4SLeonardo Garcia * PCIe device pass through. 2688581cc4SLeonardo Garcia 27a4ee352fSCédric Le GoaterMissing devices 28a23a72ddSLeonardo Garcia=============== 29a4ee352fSCédric Le Goater 3088581cc4SLeonardo Garcia * SPICE support. 31a4ee352fSCédric Le Goater 32a4ee352fSCédric Le GoaterFirmware 33a23a72ddSLeonardo Garcia======== 3488581cc4SLeonardo Garcia 35162eec18SAlexey KardashevskiyThe pSeries platform in QEMU comes with 2 firmwares: 36162eec18SAlexey Kardashevskiy 3788581cc4SLeonardo Garcia`SLOF <https://github.com/aik/SLOF>`_ (Slimline Open Firmware) is an 3888581cc4SLeonardo Garciaimplementation of the `IEEE 1275-1994, Standard for Boot (Initialization 3988581cc4SLeonardo GarciaConfiguration) Firmware: Core Requirements and Practices 4088581cc4SLeonardo Garcia<https://standards.ieee.org/standard/1275-1994.html>`_. 4188581cc4SLeonardo Garcia 42162eec18SAlexey KardashevskiySLOF performs bus scanning, PCI resource allocation, provides the client 43162eec18SAlexey Kardashevskiyinterface to boot from block devices and network. 44162eec18SAlexey Kardashevskiy 4588581cc4SLeonardo GarciaQEMU includes a prebuilt image of SLOF which is updated when a more recent 4688581cc4SLeonardo Garciaversion is required. 4788581cc4SLeonardo Garcia 48162eec18SAlexey KardashevskiyVOF (Virtual Open Firmware) is a minimalistic firmware to work with 49162eec18SAlexey Kardashevskiy``-machine pseries,x-vof=on``. When enabled, the firmware acts as a slim 50162eec18SAlexey Kardashevskiyshim and QEMU implements parts of the IEEE 1275 Open Firmware interface. 51162eec18SAlexey Kardashevskiy 52162eec18SAlexey KardashevskiyVOF does not have device drivers, does not do PCI resource allocation and 53162eec18SAlexey Kardashevskiyrelies on ``-kernel`` used with Linux kernels recent enough (v5.4+) 54162eec18SAlexey Kardashevskiyto PCI resource assignment. It is ideal to use with petitboot. 55162eec18SAlexey Kardashevskiy 56162eec18SAlexey KardashevskiyBooting via ``-kernel`` supports the following: 57162eec18SAlexey Kardashevskiy 58162eec18SAlexey Kardashevskiy+-------------------+-------------------+------------------+ 59162eec18SAlexey Kardashevskiy| kernel | pseries,x-vof=off | pseries,x-vof=on | 60162eec18SAlexey Kardashevskiy+===================+===================+==================+ 61162eec18SAlexey Kardashevskiy| vmlinux BE | ✓ | ✓ | 62162eec18SAlexey Kardashevskiy+-------------------+-------------------+------------------+ 63162eec18SAlexey Kardashevskiy| vmlinux LE | ✓ | ✓ | 64162eec18SAlexey Kardashevskiy+-------------------+-------------------+------------------+ 65c49b67f7SAlexey Kardashevskiy| zImage.pseries BE | ✓¹ | ✓¹ | 66162eec18SAlexey Kardashevskiy+-------------------+-------------------+------------------+ 67162eec18SAlexey Kardashevskiy| zImage.pseries LE | ✓ | ✓ | 68162eec18SAlexey Kardashevskiy+-------------------+-------------------+------------------+ 69162eec18SAlexey Kardashevskiy 70162eec18SAlexey Kardashevskiy¹ must set kernel-addr=0 71162eec18SAlexey Kardashevskiy 7288581cc4SLeonardo GarciaBuild directions 73a23a72ddSLeonardo Garcia================ 7488581cc4SLeonardo Garcia 7588581cc4SLeonardo Garcia.. code-block:: bash 7688581cc4SLeonardo Garcia 7788581cc4SLeonardo Garcia ./configure --target-list=ppc64-softmmu && make 7888581cc4SLeonardo Garcia 7988581cc4SLeonardo GarciaRunning instructions 80a23a72ddSLeonardo Garcia==================== 8188581cc4SLeonardo Garcia 8288581cc4SLeonardo GarciaSomeone can select the pSeries machine type by running QEMU with the following 8388581cc4SLeonardo Garciaoptions: 8488581cc4SLeonardo Garcia 8588581cc4SLeonardo Garcia.. code-block:: bash 8688581cc4SLeonardo Garcia 8788581cc4SLeonardo Garcia qemu-system-ppc64 -M pseries <other QEMU arguments> 8888581cc4SLeonardo Garcia 8988581cc4SLeonardo GarciasPAPR devices 90a23a72ddSLeonardo Garcia============= 9188581cc4SLeonardo Garcia 9288581cc4SLeonardo GarciaThe sPAPR specification defines a set of para-virtualized devices, which are 9388581cc4SLeonardo Garciaalso supported by the pSeries machine in QEMU and can be instantiated with the 9488581cc4SLeonardo Garcia``-device`` option: 9588581cc4SLeonardo Garcia 9688581cc4SLeonardo Garcia* ``spapr-vlan`` : a virtual network interface. 9788581cc4SLeonardo Garcia* ``spapr-vscsi`` : a virtual SCSI disk interface. 9888581cc4SLeonardo Garcia* ``spapr-rng`` : a pseudo-device for passing random number generator data to the 9988581cc4SLeonardo Garcia guest (see the `H_RANDOM hypercall feature 10088581cc4SLeonardo Garcia <https://wiki.qemu.org/Features/HRandomHypercall>`_ for details). 10188581cc4SLeonardo Garcia* ``spapr-vty``: a virtual teletype. 10288581cc4SLeonardo Garcia* ``spapr-pci-host-bridge``: a PCI host bridge. 10388581cc4SLeonardo Garcia* ``tpm-spapr``: a Trusted Platform Module (TPM). 10488581cc4SLeonardo Garcia* ``spapr-tpm-proxy``: a TPM proxy. 10588581cc4SLeonardo Garcia 10688581cc4SLeonardo GarciaThese are compatible with the devices historically available for use when 10788581cc4SLeonardo Garciarunning the IBM PowerVM hypervisor with LPARs. 10888581cc4SLeonardo Garcia 10988581cc4SLeonardo GarciaHowever, since these devices have originally been specified with another 11088581cc4SLeonardo Garciahypervisor and non-Linux guests in mind, you should use the virtio counterparts 11188581cc4SLeonardo Garcia(virtio-net, virtio-blk/scsi and virtio-rng for instance) if possible instead, 11288581cc4SLeonardo Garciasince they will most probably give you better performance with Linux guests in a 11388581cc4SLeonardo GarciaQEMU environment. 11488581cc4SLeonardo Garcia 11588581cc4SLeonardo GarciaThe pSeries machine in QEMU is always instantiated with the following devices: 11688581cc4SLeonardo Garcia 11788581cc4SLeonardo Garcia* A NVRAM device (``spapr-nvram``). 11888581cc4SLeonardo Garcia* A virtual teletype (``spapr-vty``). 11988581cc4SLeonardo Garcia* A PCI host bridge (``spapr-pci-host-bridge``). 12088581cc4SLeonardo Garcia 12188581cc4SLeonardo GarciaHence, it is not needed to add them manually, unless you use the ``-nodefaults`` 12288581cc4SLeonardo Garciacommand line option in QEMU. 12388581cc4SLeonardo Garcia 12488581cc4SLeonardo GarciaIn the case of the default ``spapr-nvram`` device, if someone wants to make the 12588581cc4SLeonardo Garciacontents of the NVRAM device persistent, they will need to specify a PFLASH 12688581cc4SLeonardo Garciadevice when starting QEMU, i.e. either use 12788581cc4SLeonardo Garcia``-drive if=pflash,file=<filename>,format=raw`` to set the default PFLASH 12888581cc4SLeonardo Garciadevice, or specify one with an ID 12988581cc4SLeonardo Garcia(``-drive if=none,file=<filename>,format=raw,id=pfid``) and pass that ID to the 13088581cc4SLeonardo GarciaNVRAM device with ``-global spapr-nvram.drive=pfid``. 13188581cc4SLeonardo Garcia 13288581cc4SLeonardo GarciasPAPR specification 133a23a72ddSLeonardo Garcia------------------- 13488581cc4SLeonardo Garcia 135a23a72ddSLeonardo GarciaThe main source of documentation on the sPAPR standard is the [LoPAR]_ document. 13688581cc4SLeonardo GarciaHowever, documentation specific to QEMU's implementation of the specification 13788581cc4SLeonardo Garciacan also be found in QEMU documentation: 13888581cc4SLeonardo Garcia 13988581cc4SLeonardo Garcia.. toctree:: 14088581cc4SLeonardo Garcia :maxdepth: 1 14188581cc4SLeonardo Garcia 14222beb38bSLeonardo Garcia ../../specs/ppc-spapr-hotplug.rst 1439befbe4fSLeonardo Garcia ../../specs/ppc-spapr-hcalls.rst 14488581cc4SLeonardo Garcia ../../specs/ppc-spapr-numa.rst 1458e12c012SLeonardo Garcia ../../specs/ppc-spapr-uv-hcalls.rst 14688581cc4SLeonardo Garcia ../../specs/ppc-spapr-xive.rst 14788581cc4SLeonardo Garcia 14888581cc4SLeonardo GarciaSwitching between the KVM-PR and KVM-HV kernel module 149a23a72ddSLeonardo Garcia===================================================== 15088581cc4SLeonardo Garcia 15188581cc4SLeonardo GarciaCurrently, there are two implementations of KVM on Power, ``kvm_hv.ko`` and 15288581cc4SLeonardo Garcia``kvm_pr.ko``. 15388581cc4SLeonardo Garcia 15488581cc4SLeonardo Garcia 15588581cc4SLeonardo GarciaIf a host supports both KVM modes, and both KVM kernel modules are loaded, it is 15688581cc4SLeonardo Garciapossible to switch between the two modes with the ``kvm-type`` parameter: 15788581cc4SLeonardo Garcia 15888581cc4SLeonardo Garcia* Use ``qemu-system-ppc64 -M pseries,accel=kvm,kvm-type=PR`` to use the 15988581cc4SLeonardo Garcia ``kvm_pr.ko`` kernel module. 16088581cc4SLeonardo Garcia* Use ``qemu-system-ppc64 -M pseries,accel=kvm,kvm-type=HV`` to use ``kvm_hv.ko`` 16188581cc4SLeonardo Garcia instead. 16288581cc4SLeonardo Garcia 16388581cc4SLeonardo GarciaKVM-PR 164a23a72ddSLeonardo Garcia------ 16588581cc4SLeonardo Garcia 16688581cc4SLeonardo GarciaKVM-PR uses the so-called **PR**\ oblem state of the PPC CPUs to run the guests, 16788581cc4SLeonardo Garciai.e. the virtual machine is run in user mode and all privileged instructions 16888581cc4SLeonardo Garciatrap and have to be emulated by the host. That means you can run KVM-PR inside 16988581cc4SLeonardo Garciaa pSeries guest (or a PowerVM LPAR for that matter), and that is where it has 17088581cc4SLeonardo Garciaoriginated, as historically (prior to POWER7) it was not possible to run Linux 17188581cc4SLeonardo Garciaon hypervisor mode on a Power processor (this function was restricted to 17288581cc4SLeonardo GarciaPowerVM, the IBM proprietary hypervisor). 17388581cc4SLeonardo Garcia 17488581cc4SLeonardo GarciaBecause all privileged instructions are trapped, guests that use a lot of 17588581cc4SLeonardo Garciaprivileged instructions run quite slow with KVM-PR. On the other hand, because 17688581cc4SLeonardo Garciaof that, this kernel module can run on pretty much every PPC hardware, and is 17788581cc4SLeonardo Garciaable to emulate a lot of guests CPUs. This module can even be used to run other 17888581cc4SLeonardo GarciaPowerPC guests like an emulated PowerMac. 17988581cc4SLeonardo Garcia 18088581cc4SLeonardo GarciaAs KVM-PR can be run inside a pSeries guest, it can also provide nested 18188581cc4SLeonardo Garciavirtualization capabilities (i.e. running a guest from within a guest). 18288581cc4SLeonardo Garcia 18388581cc4SLeonardo GarciaIt is important to notice that, as KVM-HV provides a much better execution 18488581cc4SLeonardo Garciaperformance, maintenance work has been much more focused on it in the past 18588581cc4SLeonardo Garciayears. Maintenance for KVM-PR has been minimal. 18688581cc4SLeonardo Garcia 18788581cc4SLeonardo GarciaIn order to run KVM-PR guests with POWER9 processors, someone will need to start 18888581cc4SLeonardo GarciaQEMU with ``kernel_irqchip=off`` command line option. 18988581cc4SLeonardo Garcia 19088581cc4SLeonardo GarciaKVM-HV 191a23a72ddSLeonardo Garcia------ 19288581cc4SLeonardo Garcia 19388581cc4SLeonardo GarciaKVM-HV uses the hypervisor mode of more recent Power processors, that allow 19488581cc4SLeonardo Garciaaccess to the bare metal hardware directly. Although POWER7 had this capability, 19588581cc4SLeonardo Garciait was only starting with POWER8 that this was officially supported by IBM. 19688581cc4SLeonardo Garcia 19788581cc4SLeonardo GarciaOriginally, KVM-HV was only available when running on a PowerNV platform (a.k.a. 19888581cc4SLeonardo GarciaPower bare metal). Although it runs on a PowerNV platform, it can only be used 19988581cc4SLeonardo Garciato start pSeries guests. As the pSeries guest doesn't have access to the 20088581cc4SLeonardo Garciahypervisor mode of the Power CPU, it wasn't possible to run KVM-HV on a guest. 20188581cc4SLeonardo GarciaThis limitation has been lifted, and now it is possible to run KVM-HV inside 20288581cc4SLeonardo GarciapSeries guests as well, making nested virtualization possible with KVM-HV. 20388581cc4SLeonardo Garcia 20488581cc4SLeonardo GarciaAs KVM-HV has access to privileged instructions, guests that use a lot of these 20588581cc4SLeonardo Garciacan run much faster than with KVM-PR. On the other hand, the guest CPU has to be 20688581cc4SLeonardo Garciaof the same type as the host CPU this way, e.g. it is not possible to specify an 20788581cc4SLeonardo Garciaembedded PPC CPU for the guest with KVM-HV. However, there is at least the 20888581cc4SLeonardo Garciapossibility to run the guest in a backward-compatibility mode of the previous 20988581cc4SLeonardo GarciaCPUs generations, e.g. you can run a POWER7 guest on a POWER8 host by using 21088581cc4SLeonardo Garcia``-cpu POWER8,compat=power7`` as parameter to QEMU. 21188581cc4SLeonardo Garcia 21288581cc4SLeonardo GarciaModules support 213a23a72ddSLeonardo Garcia=============== 21488581cc4SLeonardo Garcia 21588581cc4SLeonardo GarciaAs noticed in the sections above, each module can run in a different 21688581cc4SLeonardo Garciaenvironment. The following table shows with which environment each module can 21788581cc4SLeonardo Garciarun. As long as you are in a supported environment, you can run KVM-PR or KVM-HV 21888581cc4SLeonardo Garcianested. Combinations not shown in the table are not available. 21988581cc4SLeonardo Garcia 22088581cc4SLeonardo Garcia+--------------+------------+------+-------------------+----------+--------+ 22188581cc4SLeonardo Garcia| Platform | Host type | Bits | Page table format | KVM-HV | KVM-PR | 22288581cc4SLeonardo Garcia+==============+============+======+===================+==========+========+ 22388581cc4SLeonardo Garcia| PowerNV | bare metal | 32 | hash | no | yes | 22488581cc4SLeonardo Garcia| | | +-------------------+----------+--------+ 22588581cc4SLeonardo Garcia| | | | radix | N/A | N/A | 22688581cc4SLeonardo Garcia| | +------+-------------------+----------+--------+ 22788581cc4SLeonardo Garcia| | | 64 | hash | yes | yes | 22888581cc4SLeonardo Garcia| | | +-------------------+----------+--------+ 22988581cc4SLeonardo Garcia| | | | radix | yes | no | 23088581cc4SLeonardo Garcia+--------------+------------+------+-------------------+----------+--------+ 23188581cc4SLeonardo Garcia| pSeries [1]_ | PowerNV | 32 | hash | no | yes | 23288581cc4SLeonardo Garcia| | | +-------------------+----------+--------+ 23388581cc4SLeonardo Garcia| | | | radix | N/A | N/A | 23488581cc4SLeonardo Garcia| | +------+-------------------+----------+--------+ 23588581cc4SLeonardo Garcia| | | 64 | hash | no | yes | 23688581cc4SLeonardo Garcia| | | +-------------------+----------+--------+ 23788581cc4SLeonardo Garcia| | | | radix | yes [2]_ | no | 23888581cc4SLeonardo Garcia| +------------+------+-------------------+----------+--------+ 23988581cc4SLeonardo Garcia| | PowerVM | 32 | hash | no | yes | 24088581cc4SLeonardo Garcia| | | +-------------------+----------+--------+ 24188581cc4SLeonardo Garcia| | | | radix | N/A | N/A | 24288581cc4SLeonardo Garcia| | +------+-------------------+----------+--------+ 24388581cc4SLeonardo Garcia| | | 64 | hash | no | yes | 24488581cc4SLeonardo Garcia| | | +-------------------+----------+--------+ 24588581cc4SLeonardo Garcia| | | | radix [3]_ | no | yes | 24688581cc4SLeonardo Garcia+--------------+------------+------+-------------------+----------+--------+ 24788581cc4SLeonardo Garcia 24888581cc4SLeonardo Garcia.. [1] On POWER9 DD2.1 processors, the page table format on the host and guest 24988581cc4SLeonardo Garcia must be the same. 25088581cc4SLeonardo Garcia 25188581cc4SLeonardo Garcia.. [2] KVM-HV cannot run nested on POWER8 machines. 25288581cc4SLeonardo Garcia 25388581cc4SLeonardo Garcia.. [3] Introduced on Power10 machines. 25488581cc4SLeonardo Garcia 255808ead89SThomas Huth 25696a46defSCornelia Huck.. _power-papr-protected-execution-facility-pef: 25796a46defSCornelia Huck 258808ead89SThomas HuthPOWER (PAPR) Protected Execution Facility (PEF) 259808ead89SThomas Huth----------------------------------------------- 260808ead89SThomas Huth 261808ead89SThomas HuthProtected Execution Facility (PEF), also known as Secure Guest support 262808ead89SThomas Huthis a feature found on IBM POWER9 and POWER10 processors. 263808ead89SThomas Huth 264808ead89SThomas HuthIf a suitable firmware including an Ultravisor is installed, it adds 265808ead89SThomas Huthan extra memory protection mode to the CPU. The ultravisor manages a 266808ead89SThomas Huthpool of secure memory which cannot be accessed by the hypervisor. 267808ead89SThomas Huth 268808ead89SThomas HuthWhen this feature is enabled in QEMU, a guest can use ultracalls to 269808ead89SThomas Huthenter "secure mode". This transfers most of its memory to secure 270808ead89SThomas Huthmemory, where it cannot be eavesdropped by a compromised hypervisor. 271808ead89SThomas Huth 272808ead89SThomas HuthLaunching 273808ead89SThomas Huth^^^^^^^^^ 274808ead89SThomas Huth 275808ead89SThomas HuthTo launch a guest which will be permitted to enter PEF secure mode:: 276808ead89SThomas Huth 277808ead89SThomas Huth $ qemu-system-ppc64 \ 278808ead89SThomas Huth -object pef-guest,id=pef0 \ 279808ead89SThomas Huth -machine confidential-guest-support=pef0 \ 280808ead89SThomas Huth ... 281808ead89SThomas Huth 282808ead89SThomas HuthLive Migration 283808ead89SThomas Huth^^^^^^^^^^^^^^ 284808ead89SThomas Huth 285808ead89SThomas HuthLive migration is not yet implemented for PEF guests. For 286808ead89SThomas Huthconsistency, QEMU currently prevents migration if the PEF feature is 287808ead89SThomas Huthenabled, whether or not the guest has actually entered secure mode. 288808ead89SThomas Huth 289808ead89SThomas Huth 29088581cc4SLeonardo GarciaMaintainer contact information 291a23a72ddSLeonardo Garcia============================== 29288581cc4SLeonardo Garcia 29388581cc4SLeonardo GarciaCédric Le Goater <clg@kaod.org> 29488581cc4SLeonardo Garcia 29588581cc4SLeonardo GarciaDaniel Henrique Barboza <danielhb413@gmail.com> 296a23a72ddSLeonardo Garcia 297a23a72ddSLeonardo Garcia.. [LoPAR] `Linux on Power Architecture Reference document (LoPAR) revision 298a23a72ddSLeonardo Garcia 2.9 <https://openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200812.pdf>`_. 299