1*55ff468fSLeonardo Garcia============================= 2*55ff468fSLeonardo GarciasPAPR Dynamic Reconfiguration 3*55ff468fSLeonardo Garcia============================= 4*55ff468fSLeonardo Garcia 5*55ff468fSLeonardo GarciasPAPR or pSeries guests make use of a facility called dynamic reconfiguration 6*55ff468fSLeonardo Garciato handle hot plugging of dynamic "physical" resources like PCI cards, or 7*55ff468fSLeonardo Garcia"logical"/para-virtual resources like memory, CPUs, and "physical" 8*55ff468fSLeonardo Garciahost-bridges, which are generally managed by the host/hypervisor and provided 9*55ff468fSLeonardo Garciato guests as virtualized resources. The specifics of dynamic reconfiguration 10*55ff468fSLeonardo Garciaare documented extensively in section 13 of the Linux on Power Architecture 11*55ff468fSLeonardo GarciaReference document ([LoPAR]_). This document provides a summary of that 12*55ff468fSLeonardo Garciainformation as it applies to the implementation within QEMU. 13*55ff468fSLeonardo Garcia 14*55ff468fSLeonardo GarciaDynamic-reconfiguration Connectors 15*55ff468fSLeonardo Garcia================================== 16*55ff468fSLeonardo Garcia 17*55ff468fSLeonardo GarciaTo manage hot plug/unplug of these resources, a firmware abstraction known as 18*55ff468fSLeonardo Garciaa Dynamic Resource Connector (DRC) is used to assign a particular dynamic 19*55ff468fSLeonardo Garciaresource to the guest, and provide an interface for the guest to manage 20*55ff468fSLeonardo Garciaconfiguration/removal of the resource associated with it. 21*55ff468fSLeonardo Garcia 22*55ff468fSLeonardo GarciaDevice tree description of DRCs 23*55ff468fSLeonardo Garcia=============================== 24*55ff468fSLeonardo Garcia 25*55ff468fSLeonardo GarciaA set of four Open Firmware device tree array properties are used to describe 26*55ff468fSLeonardo Garciathe name/index/power-domain/type of each DRC allocated to a guest at 27*55ff468fSLeonardo Garciaboot time. There may be multiple sets of these arrays, rooted at different 28*55ff468fSLeonardo Garciapaths in the device tree depending on the type of resource the DRCs manage. 29*55ff468fSLeonardo Garcia 30*55ff468fSLeonardo GarciaIn some cases, the DRCs themselves may be provided by a dynamic resource, 31*55ff468fSLeonardo Garciasuch as the DRCs managing PCI slots on a hot plugged PHB. In this case the 32*55ff468fSLeonardo Garciaarrays would be fetched as part of the device tree retrieval interfaces 33*55ff468fSLeonardo Garciafor hot plugged resources described under :ref:`guest-host-interface`. 34*55ff468fSLeonardo Garcia 35*55ff468fSLeonardo GarciaThe array properties are described below. Each entry/element in an array 36*55ff468fSLeonardo Garciadescribes the DRC identified by the element in the corresponding position 37*55ff468fSLeonardo Garciaof ``ibm,drc-indexes``: 38*55ff468fSLeonardo Garcia 39*55ff468fSLeonardo Garcia``ibm,drc-names`` 40*55ff468fSLeonardo Garcia----------------- 41*55ff468fSLeonardo Garcia 42*55ff468fSLeonardo Garcia First 4-bytes: big-endian (BE) encoded integer denoting the number of entries. 43*55ff468fSLeonardo Garcia 44*55ff468fSLeonardo Garcia Each entry: a NULL-terminated ``<name>`` string encoded as a byte array. 45*55ff468fSLeonardo Garcia 46*55ff468fSLeonardo Garcia ``<name>`` values for logical/virtual resources are defined in the Linux on 47*55ff468fSLeonardo Garcia Power Architecture Reference ([LoPAR]_) section 13.5.2.4, and basically 48*55ff468fSLeonardo Garcia consist of the type of the resource followed by a space and a numerical 49*55ff468fSLeonardo Garcia value that's unique across resources of that type. 50*55ff468fSLeonardo Garcia 51*55ff468fSLeonardo Garcia ``<name>`` values for "physical" resources such as PCI or VIO devices are 52*55ff468fSLeonardo Garcia defined as being "location codes", which are the "location labels" of each 53*55ff468fSLeonardo Garcia encapsulating device, starting from the chassis down to the individual slot 54*55ff468fSLeonardo Garcia for the device, concatenated by a hyphen. This provides a mapping of 55*55ff468fSLeonardo Garcia resources to a physical location in a chassis for debugging purposes. For 56*55ff468fSLeonardo Garcia QEMU, this mapping is less important, so we assign a location code that 57*55ff468fSLeonardo Garcia conforms to naming specifications, but is simply a location label for the 58*55ff468fSLeonardo Garcia slot by itself to simplify the implementation. The naming convention for 59*55ff468fSLeonardo Garcia location labels is documented in detail in the [LoPAR]_ section 12.3.1.5, 60*55ff468fSLeonardo Garcia and in our case amounts to using ``C<n>`` for PCI/VIO device slots, where 61*55ff468fSLeonardo Garcia ``<n>`` is unique across all PCI/VIO device slots. 62*55ff468fSLeonardo Garcia 63*55ff468fSLeonardo Garcia``ibm,drc-indexes`` 64*55ff468fSLeonardo Garcia------------------- 65*55ff468fSLeonardo Garcia 66*55ff468fSLeonardo Garcia First 4-bytes: BE-encoded integer denoting the number of entries. 67*55ff468fSLeonardo Garcia 68*55ff468fSLeonardo Garcia Each 4-byte entry: BE-encoded ``<index>`` integer that is unique across all 69*55ff468fSLeonardo Garcia DRCs in the machine. 70*55ff468fSLeonardo Garcia 71*55ff468fSLeonardo Garcia ``<index>`` is arbitrary, but in the case of QEMU we try to maintain the 72*55ff468fSLeonardo Garcia convention used to assign them to pSeries guests on pHyp (the hypervisor 73*55ff468fSLeonardo Garcia portion of PowerVM): 74*55ff468fSLeonardo Garcia 75*55ff468fSLeonardo Garcia ``bit[31:28]``: integer encoding of ``<type>``, where ``<type>`` is: 76*55ff468fSLeonardo Garcia 77*55ff468fSLeonardo Garcia ``1`` for CPU resource. 78*55ff468fSLeonardo Garcia 79*55ff468fSLeonardo Garcia ``2`` for PHB resource. 80*55ff468fSLeonardo Garcia 81*55ff468fSLeonardo Garcia ``3`` for VIO resource. 82*55ff468fSLeonardo Garcia 83*55ff468fSLeonardo Garcia ``4`` for PCI resource. 84*55ff468fSLeonardo Garcia 85*55ff468fSLeonardo Garcia ``8`` for memory resource. 86*55ff468fSLeonardo Garcia 87*55ff468fSLeonardo Garcia ``bit[27:0]``: integer encoding of ``<id>``, where ``<id>`` is unique 88*55ff468fSLeonardo Garcia across all resources of specified type. 89*55ff468fSLeonardo Garcia 90*55ff468fSLeonardo Garcia``ibm,drc-power-domains`` 91*55ff468fSLeonardo Garcia------------------------- 92*55ff468fSLeonardo Garcia 93*55ff468fSLeonardo Garcia First 4-bytes: BE-encoded integer denoting the number of entries. 94*55ff468fSLeonardo Garcia 95*55ff468fSLeonardo Garcia Each 4-byte entry: 32-bit, BE-encoded ``<index>`` integer that specifies the 96*55ff468fSLeonardo Garcia power domain the resource will be assigned to. In the case of QEMU we 97*55ff468fSLeonardo Garcia associated all resources with a "live insertion" domain, where the power is 98*55ff468fSLeonardo Garcia assumed to be managed automatically. The integer value for this domain is a 99*55ff468fSLeonardo Garcia special value of ``-1``. 100*55ff468fSLeonardo Garcia 101*55ff468fSLeonardo Garcia 102*55ff468fSLeonardo Garcia``ibm,drc-types`` 103*55ff468fSLeonardo Garcia----------------- 104*55ff468fSLeonardo Garcia 105*55ff468fSLeonardo Garcia First 4-bytes: BE-encoded integer denoting the number of entries. 106*55ff468fSLeonardo Garcia 107*55ff468fSLeonardo Garcia Each entry: a NULL-terminated ``<type>`` string encoded as a byte array. 108*55ff468fSLeonardo Garcia ``<type>`` is assigned as follows: 109*55ff468fSLeonardo Garcia 110*55ff468fSLeonardo Garcia "CPU" for a CPU. 111*55ff468fSLeonardo Garcia 112*55ff468fSLeonardo Garcia "PHB" for a physical host-bridge. 113*55ff468fSLeonardo Garcia 114*55ff468fSLeonardo Garcia "SLOT" for a VIO slot. 115*55ff468fSLeonardo Garcia 116*55ff468fSLeonardo Garcia "28" for a PCI slot. 117*55ff468fSLeonardo Garcia 118*55ff468fSLeonardo Garcia "MEM" for memory resource. 119*55ff468fSLeonardo Garcia 120*55ff468fSLeonardo Garcia.. _guest-host-interface: 121*55ff468fSLeonardo Garcia 122*55ff468fSLeonardo GarciaGuest->Host interface to manage dynamic resources 123*55ff468fSLeonardo Garcia================================================= 124*55ff468fSLeonardo Garcia 125*55ff468fSLeonardo GarciaEach DRC is given a globally unique DRC index, and resources associated with a 126*55ff468fSLeonardo Garciaparticular DRC are configured/managed by the guest via a number of RTAS calls 127*55ff468fSLeonardo Garciawhich reference individual DRCs based on the DRC index. This can be considered 128*55ff468fSLeonardo Garciathe guest->host interface. 129*55ff468fSLeonardo Garcia 130*55ff468fSLeonardo Garcia``rtas-set-power-level`` 131*55ff468fSLeonardo Garcia------------------------ 132*55ff468fSLeonardo Garcia 133*55ff468fSLeonardo GarciaSet the power level for a specified power domain. 134*55ff468fSLeonardo Garcia 135*55ff468fSLeonardo Garcia ``arg[0]``: integer identifying power domain. 136*55ff468fSLeonardo Garcia 137*55ff468fSLeonardo Garcia ``arg[1]``: new power level for the domain, ``0-100``. 138*55ff468fSLeonardo Garcia 139*55ff468fSLeonardo Garcia ``output[0]``: status, ``0`` on success. 140*55ff468fSLeonardo Garcia 141*55ff468fSLeonardo Garcia ``output[1]``: power level after command. 142*55ff468fSLeonardo Garcia 143*55ff468fSLeonardo Garcia``rtas-get-power-level`` 144*55ff468fSLeonardo Garcia------------------------ 145*55ff468fSLeonardo Garcia 146*55ff468fSLeonardo GarciaGet the power level for a specified power domain. 147*55ff468fSLeonardo Garcia 148*55ff468fSLeonardo Garcia ``arg[0]``: integer identifying power domain. 149*55ff468fSLeonardo Garcia 150*55ff468fSLeonardo Garcia ``output[0]``: status, ``0`` on success. 151*55ff468fSLeonardo Garcia 152*55ff468fSLeonardo Garcia ``output[1]``: current power level. 153*55ff468fSLeonardo Garcia 154*55ff468fSLeonardo Garcia``rtas-set-indicator`` 155*55ff468fSLeonardo Garcia---------------------- 156*55ff468fSLeonardo Garcia 157*55ff468fSLeonardo GarciaSet the state of an indicator or sensor. 158*55ff468fSLeonardo Garcia 159*55ff468fSLeonardo Garcia ``arg[0]``: integer identifying sensor/indicator type. 160*55ff468fSLeonardo Garcia 161*55ff468fSLeonardo Garcia ``arg[1]``: index of sensor, for DR-related sensors this is generally the DRC 162*55ff468fSLeonardo Garcia index. 163*55ff468fSLeonardo Garcia 164*55ff468fSLeonardo Garcia ``arg[2]``: desired sensor value. 165*55ff468fSLeonardo Garcia 166*55ff468fSLeonardo Garcia ``output[0]``: status, ``0`` on success. 167*55ff468fSLeonardo Garcia 168*55ff468fSLeonardo GarciaFor the purpose of this document we focus on the indicator/sensor types 169*55ff468fSLeonardo Garciaassociated with a DRC. The types are: 170*55ff468fSLeonardo Garcia 171*55ff468fSLeonardo Garcia* ``9001``: ``isolation-state``, controls/indicates whether a device has been 172*55ff468fSLeonardo Garcia made accessible to a guest. Supported sensor values: 173*55ff468fSLeonardo Garcia 174*55ff468fSLeonardo Garcia ``0``: ``isolate``, device is made inaccessible by guest OS. 175*55ff468fSLeonardo Garcia 176*55ff468fSLeonardo Garcia ``1``: ``unisolate``, device is made available to guest OS. 177*55ff468fSLeonardo Garcia 178*55ff468fSLeonardo Garcia* ``9002``: ``dr-indicator``, controls "visual" indicator associated with 179*55ff468fSLeonardo Garcia device. Supported sensor values: 180*55ff468fSLeonardo Garcia 181*55ff468fSLeonardo Garcia ``0``: ``inactive``, resource may be safely removed. 182*55ff468fSLeonardo Garcia 183*55ff468fSLeonardo Garcia ``1``: ``active``, resource is in use and cannot be safely removed. 184*55ff468fSLeonardo Garcia 185*55ff468fSLeonardo Garcia ``2``: ``identify``, used to visually identify slot for interactive hot plug. 186*55ff468fSLeonardo Garcia 187*55ff468fSLeonardo Garcia ``3``: ``action``, in most cases, used in the same manner as identify. 188*55ff468fSLeonardo Garcia 189*55ff468fSLeonardo Garcia* ``9003``: ``allocation-state``, generally only used for "logical" DR resources 190*55ff468fSLeonardo Garcia to request the allocation/deallocation of a resource prior to acquiring it via 191*55ff468fSLeonardo Garcia ``isolation-state->unisolate``, or after releasing it via 192*55ff468fSLeonardo Garcia ``isolation-state->isolate``, respectively. For "physical" DR (like PCI 193*55ff468fSLeonardo Garcia hot plug/unplug) the pre-allocation of the resource is implied and this sensor 194*55ff468fSLeonardo Garcia is unused. Supported sensor values: 195*55ff468fSLeonardo Garcia 196*55ff468fSLeonardo Garcia ``0``: ``unusable``, tell firmware/system the resource can be 197*55ff468fSLeonardo Garcia unallocated/reclaimed and added back to the system resource pool. 198*55ff468fSLeonardo Garcia 199*55ff468fSLeonardo Garcia ``1``: ``usable``, request the resource be allocated/reserved for use by 200*55ff468fSLeonardo Garcia guest OS. 201*55ff468fSLeonardo Garcia 202*55ff468fSLeonardo Garcia ``2``: ``exchange``, used to allocate a spare resource to use for fail-over 203*55ff468fSLeonardo Garcia in certain situations. Unused in QEMU. 204*55ff468fSLeonardo Garcia 205*55ff468fSLeonardo Garcia ``3``: ``recover``, used to reclaim a previously allocated resource that's 206*55ff468fSLeonardo Garcia not currently allocated to the guest OS. Unused in QEMU. 207*55ff468fSLeonardo Garcia 208*55ff468fSLeonardo Garcia``rtas-get-sensor-state:`` 209*55ff468fSLeonardo Garcia-------------------------- 210*55ff468fSLeonardo Garcia 211*55ff468fSLeonardo GarciaUsed to read an indicator or sensor value. 212*55ff468fSLeonardo Garcia 213*55ff468fSLeonardo Garcia ``arg[0]``: integer identifying sensor/indicator type. 214*55ff468fSLeonardo Garcia 215*55ff468fSLeonardo Garcia ``arg[1]``: index of sensor, for DR-related sensors this is generally the DRC 216*55ff468fSLeonardo Garcia index 217*55ff468fSLeonardo Garcia 218*55ff468fSLeonardo Garcia ``output[0]``: status, 0 on success 219*55ff468fSLeonardo Garcia 220*55ff468fSLeonardo GarciaFor DR-related operations, the only noteworthy sensor is ``dr-entity-sense``, 221*55ff468fSLeonardo Garciawhich has a type value of ``9003``, as ``allocation-state`` does in the case of 222*55ff468fSLeonardo Garcia``rtas-set-indicator``. The semantics/encodings of the sensor values are 223*55ff468fSLeonardo Garciadistinct however. 224*55ff468fSLeonardo Garcia 225*55ff468fSLeonardo GarciaSupported sensor values for ``dr-entity-sense`` (``9003``) sensor: 226*55ff468fSLeonardo Garcia 227*55ff468fSLeonardo Garcia ``0``: empty. 228*55ff468fSLeonardo Garcia 229*55ff468fSLeonardo Garcia For physical resources: DRC/slot is empty. 230*55ff468fSLeonardo Garcia 231*55ff468fSLeonardo Garcia For logical resources: unused. 232*55ff468fSLeonardo Garcia 233*55ff468fSLeonardo Garcia ``1``: present. 234*55ff468fSLeonardo Garcia 235*55ff468fSLeonardo Garcia For physical resources: DRC/slot is populated with a device/resource. 236*55ff468fSLeonardo Garcia 237*55ff468fSLeonardo Garcia For logical resources: resource has been allocated to the DRC. 238*55ff468fSLeonardo Garcia 239*55ff468fSLeonardo Garcia ``2``: unusable. 240*55ff468fSLeonardo Garcia 241*55ff468fSLeonardo Garcia For physical resources: unused. 242*55ff468fSLeonardo Garcia 243*55ff468fSLeonardo Garcia For logical resources: DRC has no resource allocated to it. 244*55ff468fSLeonardo Garcia 245*55ff468fSLeonardo Garcia ``3``: exchange. 246*55ff468fSLeonardo Garcia 247*55ff468fSLeonardo Garcia For physical resources: unused. 248*55ff468fSLeonardo Garcia 249*55ff468fSLeonardo Garcia For logical resources: resource available for exchange (see 250*55ff468fSLeonardo Garcia ``allocation-state`` sensor semantics above). 251*55ff468fSLeonardo Garcia 252*55ff468fSLeonardo Garcia ``4``: recovery. 253*55ff468fSLeonardo Garcia 254*55ff468fSLeonardo Garcia For physical resources: unused. 255*55ff468fSLeonardo Garcia 256*55ff468fSLeonardo Garcia For logical resources: resource available for recovery (see 257*55ff468fSLeonardo Garcia ``allocation-state`` sensor semantics above). 258*55ff468fSLeonardo Garcia 259*55ff468fSLeonardo Garcia``rtas-ibm-configure-connector`` 260*55ff468fSLeonardo Garcia-------------------------------- 261*55ff468fSLeonardo Garcia 262*55ff468fSLeonardo GarciaUsed to fetch an OpenFirmware device tree description of the resource associated 263*55ff468fSLeonardo Garciawith a particular DRC. 264*55ff468fSLeonardo Garcia 265*55ff468fSLeonardo Garcia ``arg[0]``: guest physical address of 4096-byte work area buffer. 266*55ff468fSLeonardo Garcia 267*55ff468fSLeonardo Garcia ``arg[1]``: 0, or address of additional 4096-byte work area buffer; only 268*55ff468fSLeonardo Garcia non-zero if a prior RTAS response indicated a need for additional memory. 269*55ff468fSLeonardo Garcia 270*55ff468fSLeonardo Garcia ``output[0]``: status: 271*55ff468fSLeonardo Garcia 272*55ff468fSLeonardo Garcia ``0``: completed transmittal of device tree node. 273*55ff468fSLeonardo Garcia 274*55ff468fSLeonardo Garcia ``1``: instruct guest to prepare for next device tree sibling node. 275*55ff468fSLeonardo Garcia 276*55ff468fSLeonardo Garcia ``2``: instruct guest to prepare for next device tree child node. 277*55ff468fSLeonardo Garcia 278*55ff468fSLeonardo Garcia ``3``: instruct guest to prepare for next device tree property. 279*55ff468fSLeonardo Garcia 280*55ff468fSLeonardo Garcia ``4``: instruct guest to ascend to parent device tree node. 281*55ff468fSLeonardo Garcia 282*55ff468fSLeonardo Garcia ``5``: instruct guest to provide additional work-area buffer via ``arg[1]``. 283*55ff468fSLeonardo Garcia 284*55ff468fSLeonardo Garcia ``990x``: instruct guest that operation took too long and to try again 285*55ff468fSLeonardo Garcia later. 286*55ff468fSLeonardo Garcia 287*55ff468fSLeonardo GarciaThe DRC index is encoded in the first 4-bytes of the first work area buffer. 288*55ff468fSLeonardo GarciaWork area (``wa``) layout, using 4-byte offsets: 289*55ff468fSLeonardo Garcia 290*55ff468fSLeonardo Garcia ``wa[0]``: DRC index of the DRC to fetch device tree nodes from. 291*55ff468fSLeonardo Garcia 292*55ff468fSLeonardo Garcia ``wa[1]``: ``0`` (hard-coded). 293*55ff468fSLeonardo Garcia 294*55ff468fSLeonardo Garcia ``wa[2]``: 295*55ff468fSLeonardo Garcia 296*55ff468fSLeonardo Garcia For next-sibling/next-child response: 297*55ff468fSLeonardo Garcia 298*55ff468fSLeonardo Garcia ``wa`` offset of null-terminated string denoting the new node's name. 299*55ff468fSLeonardo Garcia 300*55ff468fSLeonardo Garcia For next-property response: 301*55ff468fSLeonardo Garcia 302*55ff468fSLeonardo Garcia ``wa`` offset of null-terminated string denoting new property's name. 303*55ff468fSLeonardo Garcia 304*55ff468fSLeonardo Garcia ``wa[3]``: for next-property response (unused otherwise): 305*55ff468fSLeonardo Garcia 306*55ff468fSLeonardo Garcia Byte-length of new property's value. 307*55ff468fSLeonardo Garcia 308*55ff468fSLeonardo Garcia ``wa[4]``: for next-property response (unused otherwise): 309*55ff468fSLeonardo Garcia 310*55ff468fSLeonardo Garcia New property's value, encoded as an OFDT-compatible byte array. 311*55ff468fSLeonardo Garcia 312*55ff468fSLeonardo GarciaHot plug/unplug events 313*55ff468fSLeonardo Garcia====================== 314*55ff468fSLeonardo Garcia 315*55ff468fSLeonardo GarciaFor most DR operations, the hypervisor will issue host->guest add/remove events 316*55ff468fSLeonardo Garciausing the EPOW/check-exception notification framework, where the host issues a 317*55ff468fSLeonardo Garciacheck-exception interrupt, then provides an RTAS event log via an 318*55ff468fSLeonardo Garciartas-check-exception call issued by the guest in response. This framework is 319*55ff468fSLeonardo Garciadocumented by PAPR+ v2.7, and already use in by QEMU for generating powerdown 320*55ff468fSLeonardo Garciarequests via EPOW events. 321*55ff468fSLeonardo Garcia 322*55ff468fSLeonardo GarciaFor DR, this framework has been extended to include hotplug events, which were 323*55ff468fSLeonardo Garciapreviously unneeded due to direct manipulation of DR-related guest userspace 324*55ff468fSLeonardo Garciatools by host-level management such as an HMC. This level of management is not 325*55ff468fSLeonardo Garciaapplicable to KVM on Power, hence the reason for extending the notification 326*55ff468fSLeonardo Garciaframework to support hotplug events. 327*55ff468fSLeonardo Garcia 328*55ff468fSLeonardo GarciaThe format for these EPOW-signalled events is described below under 329*55ff468fSLeonardo Garcia:ref:`hot-plug-unplug-event-structure`. Note that these events are not formally 330*55ff468fSLeonardo Garciapart of the PAPR+ specification, and have been superseded by a newer format, 331*55ff468fSLeonardo Garciaalso described below under :ref:`hot-plug-unplug-event-structure`, and so are 332*55ff468fSLeonardo Garcianow deemed a "legacy" format. The formats are similar, but the "modern" format 333*55ff468fSLeonardo Garciacontains additional fields/flags, which are denoted for the purposes of this 334*55ff468fSLeonardo Garciadocumentation with ``#ifdef GUEST_SUPPORTS_MODERN`` guards. 335*55ff468fSLeonardo Garcia 336*55ff468fSLeonardo GarciaQEMU should assume support only for "legacy" fields/flags unless the guest 337*55ff468fSLeonardo Garciaadvertises support for the "modern" format via 338*55ff468fSLeonardo Garcia``ibm,client-architecture-support`` hcall by setting byte 5, bit 6 of it's 339*55ff468fSLeonardo Garcia``ibm,architecture-vec-5`` option vector structure (as described by [LoPAR]_, 340*55ff468fSLeonardo Garciasection B.5.2.3). As with "legacy" format events, "modern" format events are 341*55ff468fSLeonardo Garciasurfaced to the guest via check-exception RTAS calls, but use a dedicated event 342*55ff468fSLeonardo Garciasource to signal the guest. This event source is advertised to the guest by the 343*55ff468fSLeonardo Garciaaddition of a ``hot-plug-events`` node under ``/event-sources`` node of the 344*55ff468fSLeonardo Garciaguest's device tree using the standard format described in [LoPAR]_, 345*55ff468fSLeonardo Garciasection B.5.12.2. 346*55ff468fSLeonardo Garcia 347*55ff468fSLeonardo Garcia.. _hot-plug-unplug-event-structure: 348*55ff468fSLeonardo Garcia 349*55ff468fSLeonardo GarciaHot plug/unplug event structure 350*55ff468fSLeonardo Garcia=============================== 351*55ff468fSLeonardo Garcia 352*55ff468fSLeonardo GarciaThe hot plug specific payload in QEMU is implemented as follows (with all values 353*55ff468fSLeonardo Garciaencoded in big-endian format): 354*55ff468fSLeonardo Garcia 355*55ff468fSLeonardo Garcia.. code-block:: c 356*55ff468fSLeonardo Garcia 357*55ff468fSLeonardo Garcia struct rtas_event_log_v6_hp { 358*55ff468fSLeonardo Garcia #define SECTION_ID_HOTPLUG 0x4850 /* HP */ 359*55ff468fSLeonardo Garcia struct section_header { 360*55ff468fSLeonardo Garcia uint16_t section_id; /* set to SECTION_ID_HOTPLUG */ 361*55ff468fSLeonardo Garcia uint16_t section_length; /* sizeof(rtas_event_log_v6_hp), 362*55ff468fSLeonardo Garcia * plus the length of the DRC name 363*55ff468fSLeonardo Garcia * if a DRC name identifier is 364*55ff468fSLeonardo Garcia * specified for hotplug_identifier 365*55ff468fSLeonardo Garcia */ 366*55ff468fSLeonardo Garcia uint8_t section_version; /* version 1 */ 367*55ff468fSLeonardo Garcia uint8_t section_subtype; /* unused */ 368*55ff468fSLeonardo Garcia uint16_t creator_component_id; /* unused */ 369*55ff468fSLeonardo Garcia } hdr; 370*55ff468fSLeonardo Garcia #define RTAS_LOG_V6_HP_TYPE_CPU 1 371*55ff468fSLeonardo Garcia #define RTAS_LOG_V6_HP_TYPE_MEMORY 2 372*55ff468fSLeonardo Garcia #define RTAS_LOG_V6_HP_TYPE_SLOT 3 373*55ff468fSLeonardo Garcia #define RTAS_LOG_V6_HP_TYPE_PHB 4 374*55ff468fSLeonardo Garcia #define RTAS_LOG_V6_HP_TYPE_PCI 5 375*55ff468fSLeonardo Garcia uint8_t hotplug_type; /* type of resource/device */ 376*55ff468fSLeonardo Garcia #define RTAS_LOG_V6_HP_ACTION_ADD 1 377*55ff468fSLeonardo Garcia #define RTAS_LOG_V6_HP_ACTION_REMOVE 2 378*55ff468fSLeonardo Garcia uint8_t hotplug_action; /* action (add/remove) */ 379*55ff468fSLeonardo Garcia #define RTAS_LOG_V6_HP_ID_DRC_NAME 1 380*55ff468fSLeonardo Garcia #define RTAS_LOG_V6_HP_ID_DRC_INDEX 2 381*55ff468fSLeonardo Garcia #define RTAS_LOG_V6_HP_ID_DRC_COUNT 3 382*55ff468fSLeonardo Garcia #ifdef GUEST_SUPPORTS_MODERN 383*55ff468fSLeonardo Garcia #define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED 4 384*55ff468fSLeonardo Garcia #endif 385*55ff468fSLeonardo Garcia uint8_t hotplug_identifier; /* type of the resource identifier, 386*55ff468fSLeonardo Garcia * which serves as the discriminator 387*55ff468fSLeonardo Garcia * for the 'drc' union field below 388*55ff468fSLeonardo Garcia */ 389*55ff468fSLeonardo Garcia #ifdef GUEST_SUPPORTS_MODERN 390*55ff468fSLeonardo Garcia uint8_t capabilities; /* capability flags, currently unused 391*55ff468fSLeonardo Garcia * by QEMU 392*55ff468fSLeonardo Garcia */ 393*55ff468fSLeonardo Garcia #else 394*55ff468fSLeonardo Garcia uint8_t reserved; 395*55ff468fSLeonardo Garcia #endif 396*55ff468fSLeonardo Garcia union { 397*55ff468fSLeonardo Garcia uint32_t index; /* DRC index of resource to take action 398*55ff468fSLeonardo Garcia * on 399*55ff468fSLeonardo Garcia */ 400*55ff468fSLeonardo Garcia uint32_t count; /* number of DR resources to take 401*55ff468fSLeonardo Garcia * action on (guest chooses which) 402*55ff468fSLeonardo Garcia */ 403*55ff468fSLeonardo Garcia #ifdef GUEST_SUPPORTS_MODERN 404*55ff468fSLeonardo Garcia struct { 405*55ff468fSLeonardo Garcia uint32_t count; /* number of DR resources to take 406*55ff468fSLeonardo Garcia * action on 407*55ff468fSLeonardo Garcia */ 408*55ff468fSLeonardo Garcia uint32_t index; /* DRC index of first resource to take 409*55ff468fSLeonardo Garcia * action on. guest will take action 410*55ff468fSLeonardo Garcia * on DRC index <index> through 411*55ff468fSLeonardo Garcia * DRC index <index + count - 1> in 412*55ff468fSLeonardo Garcia * sequential order 413*55ff468fSLeonardo Garcia */ 414*55ff468fSLeonardo Garcia } count_indexed; 415*55ff468fSLeonardo Garcia #endif 416*55ff468fSLeonardo Garcia char name[1]; /* string representing the name of the 417*55ff468fSLeonardo Garcia * DRC to take action on 418*55ff468fSLeonardo Garcia */ 419*55ff468fSLeonardo Garcia } drc; 420*55ff468fSLeonardo Garcia } QEMU_PACKED; 421*55ff468fSLeonardo Garcia 422*55ff468fSLeonardo Garcia``ibm,lrdr-capacity`` 423*55ff468fSLeonardo Garcia===================== 424*55ff468fSLeonardo Garcia 425*55ff468fSLeonardo Garcia``ibm,lrdr-capacity`` is a property in the /rtas device tree node that 426*55ff468fSLeonardo Garciaidentifies the dynamic reconfiguration capabilities of the guest. It consists 427*55ff468fSLeonardo Garciaof a triple consisting of ``<phys>``, ``<size>`` and ``<maxcpus>``. 428*55ff468fSLeonardo Garcia 429*55ff468fSLeonardo Garcia ``<phys>``, encoded in BE format represents the maximum address in bytes and 430*55ff468fSLeonardo Garcia hence the maximum memory that can be allocated to the guest. 431*55ff468fSLeonardo Garcia 432*55ff468fSLeonardo Garcia ``<size>``, encoded in BE format represents the size increments in which 433*55ff468fSLeonardo Garcia memory can be hot-plugged to the guest. 434*55ff468fSLeonardo Garcia 435*55ff468fSLeonardo Garcia ``<maxcpus>``, a BE-encoded integer, represents the maximum number of 436*55ff468fSLeonardo Garcia processors that the guest can have. 437*55ff468fSLeonardo Garcia 438*55ff468fSLeonardo Garcia``pseries`` guests use this property to note the maximum allowed CPUs for the 439*55ff468fSLeonardo Garciaguest. 440*55ff468fSLeonardo Garcia 441*55ff468fSLeonardo Garcia``ibm,dynamic-reconfiguration-memory`` 442*55ff468fSLeonardo Garcia====================================== 443*55ff468fSLeonardo Garcia 444*55ff468fSLeonardo Garcia``ibm,dynamic-reconfiguration-memory`` is a device tree node that represents 445*55ff468fSLeonardo Garciadynamically reconfigurable logical memory blocks (LMB). This node is generated 446*55ff468fSLeonardo Garciaonly when the guest advertises the support for it via 447*55ff468fSLeonardo Garcia``ibm,client-architecture-support`` call. Memory that is not dynamically 448*55ff468fSLeonardo Garciareconfigurable is represented by ``/memory`` nodes. The properties of this node 449*55ff468fSLeonardo Garciathat are of interest to the sPAPR memory hotplug implementation in QEMU are 450*55ff468fSLeonardo Garciadescribed here. 451*55ff468fSLeonardo Garcia 452*55ff468fSLeonardo Garcia``ibm,lmb-size`` 453*55ff468fSLeonardo Garcia---------------- 454*55ff468fSLeonardo Garcia 455*55ff468fSLeonardo GarciaThis 64-bit integer defines the size of each dynamically reconfigurable LMB. 456*55ff468fSLeonardo Garcia 457*55ff468fSLeonardo Garcia``ibm,associativity-lookup-arrays`` 458*55ff468fSLeonardo Garcia----------------------------------- 459*55ff468fSLeonardo Garcia 460*55ff468fSLeonardo GarciaThis property defines a lookup array in which the NUMA associativity 461*55ff468fSLeonardo Garciainformation for each LMB can be found. It is a property encoded array 462*55ff468fSLeonardo Garciathat begins with an integer M, the number of associativity lists followed 463*55ff468fSLeonardo Garciaby an integer N, the number of entries per associativity list and terminated 464*55ff468fSLeonardo Garciaby M associativity lists each of length N integers. 465*55ff468fSLeonardo Garcia 466*55ff468fSLeonardo GarciaThis property provides the same information as given by ``ibm,associativity`` 467*55ff468fSLeonardo Garciaproperty in a ``/memory`` node. Each assigned LMB has an index value between 468*55ff468fSLeonardo Garcia0 and M-1 which is used as an index into this table to select which 469*55ff468fSLeonardo Garciaassociativity list to use for the LMB. This index value for each LMB is defined 470*55ff468fSLeonardo Garciain ``ibm,dynamic-memory`` property. 471*55ff468fSLeonardo Garcia 472*55ff468fSLeonardo Garcia``ibm,dynamic-memory`` 473*55ff468fSLeonardo Garcia---------------------- 474*55ff468fSLeonardo Garcia 475*55ff468fSLeonardo GarciaThis property describes the dynamically reconfigurable memory. It is a 476*55ff468fSLeonardo Garciaproperty encoded array that has an integer N, the number of LMBs followed 477*55ff468fSLeonardo Garciaby N LMB list entries. 478*55ff468fSLeonardo Garcia 479*55ff468fSLeonardo GarciaEach LMB list entry consists of the following elements: 480*55ff468fSLeonardo Garcia 481*55ff468fSLeonardo Garcia- Logical address of the start of the LMB encoded as a 64-bit integer. This 482*55ff468fSLeonardo Garcia corresponds to ``reg`` property in ``/memory`` node. 483*55ff468fSLeonardo Garcia- DRC index of the LMB that corresponds to ``ibm,my-drc-index`` property 484*55ff468fSLeonardo Garcia in a ``/memory`` node. 485*55ff468fSLeonardo Garcia- Four bytes reserved for expansion. 486*55ff468fSLeonardo Garcia- Associativity list index for the LMB that is used as an index into 487*55ff468fSLeonardo Garcia ``ibm,associativity-lookup-arrays`` property described earlier. This is used 488*55ff468fSLeonardo Garcia to retrieve the right associativity list to be used for this LMB. 489*55ff468fSLeonardo Garcia- A 32-bit flags word. The bit at bit position ``0x00000008`` defines whether 490*55ff468fSLeonardo Garcia the LMB is assigned to the partition as of boot time. 491*55ff468fSLeonardo Garcia 492*55ff468fSLeonardo Garcia``ibm,dynamic-memory-v2`` 493*55ff468fSLeonardo Garcia------------------------- 494*55ff468fSLeonardo Garcia 495*55ff468fSLeonardo GarciaThis property describes the dynamically reconfigurable memory. This is 496*55ff468fSLeonardo Garciaan alternate and newer way to describe dynamically reconfigurable memory. 497*55ff468fSLeonardo GarciaIt is a property encoded array that has an integer N (the number of 498*55ff468fSLeonardo GarciaLMB set entries) followed by N LMB set entries. There is an LMB set entry 499*55ff468fSLeonardo Garciafor each sequential group of LMBs that share common attributes. 500*55ff468fSLeonardo Garcia 501*55ff468fSLeonardo GarciaEach LMB set entry consists of the following elements: 502*55ff468fSLeonardo Garcia 503*55ff468fSLeonardo Garcia- Number of sequential LMBs in the entry represented by a 32-bit integer. 504*55ff468fSLeonardo Garcia- Logical address of the first LMB in the set encoded as a 64-bit integer. 505*55ff468fSLeonardo Garcia- DRC index of the first LMB in the set. 506*55ff468fSLeonardo Garcia- Associativity list index that is used as an index into 507*55ff468fSLeonardo Garcia ``ibm,associativity-lookup-arrays`` property described earlier. This 508*55ff468fSLeonardo Garcia is used to retrieve the right associativity list to be used for all 509*55ff468fSLeonardo Garcia the LMBs in this set. 510*55ff468fSLeonardo Garcia- A 32-bit flags word that applies to all the LMBs in the set. 511