xref: /qemu/docs/specs/ppc-spapr-hotplug.rst (revision 55ff468f)
1*55ff468fSLeonardo Garcia=============================
2*55ff468fSLeonardo GarciasPAPR Dynamic Reconfiguration
3*55ff468fSLeonardo Garcia=============================
4*55ff468fSLeonardo Garcia
5*55ff468fSLeonardo GarciasPAPR or pSeries guests make use of a facility called dynamic reconfiguration
6*55ff468fSLeonardo Garciato handle hot plugging of dynamic "physical" resources like PCI cards, or
7*55ff468fSLeonardo Garcia"logical"/para-virtual resources like memory, CPUs, and "physical"
8*55ff468fSLeonardo Garciahost-bridges, which are generally managed by the host/hypervisor and provided
9*55ff468fSLeonardo Garciato guests as virtualized resources. The specifics of dynamic reconfiguration
10*55ff468fSLeonardo Garciaare documented extensively in section 13 of the Linux on Power Architecture
11*55ff468fSLeonardo GarciaReference document ([LoPAR]_). This document provides a summary of that
12*55ff468fSLeonardo Garciainformation as it applies to the implementation within QEMU.
13*55ff468fSLeonardo Garcia
14*55ff468fSLeonardo GarciaDynamic-reconfiguration Connectors
15*55ff468fSLeonardo Garcia==================================
16*55ff468fSLeonardo Garcia
17*55ff468fSLeonardo GarciaTo manage hot plug/unplug of these resources, a firmware abstraction known as
18*55ff468fSLeonardo Garciaa Dynamic Resource Connector (DRC) is used to assign a particular dynamic
19*55ff468fSLeonardo Garciaresource to the guest, and provide an interface for the guest to manage
20*55ff468fSLeonardo Garciaconfiguration/removal of the resource associated with it.
21*55ff468fSLeonardo Garcia
22*55ff468fSLeonardo GarciaDevice tree description of DRCs
23*55ff468fSLeonardo Garcia===============================
24*55ff468fSLeonardo Garcia
25*55ff468fSLeonardo GarciaA set of four Open Firmware device tree array properties are used to describe
26*55ff468fSLeonardo Garciathe name/index/power-domain/type of each DRC allocated to a guest at
27*55ff468fSLeonardo Garciaboot time. There may be multiple sets of these arrays, rooted at different
28*55ff468fSLeonardo Garciapaths in the device tree depending on the type of resource the DRCs manage.
29*55ff468fSLeonardo Garcia
30*55ff468fSLeonardo GarciaIn some cases, the DRCs themselves may be provided by a dynamic resource,
31*55ff468fSLeonardo Garciasuch as the DRCs managing PCI slots on a hot plugged PHB. In this case the
32*55ff468fSLeonardo Garciaarrays would be fetched as part of the device tree retrieval interfaces
33*55ff468fSLeonardo Garciafor hot plugged resources described under :ref:`guest-host-interface`.
34*55ff468fSLeonardo Garcia
35*55ff468fSLeonardo GarciaThe array properties are described below. Each entry/element in an array
36*55ff468fSLeonardo Garciadescribes the DRC identified by the element in the corresponding position
37*55ff468fSLeonardo Garciaof ``ibm,drc-indexes``:
38*55ff468fSLeonardo Garcia
39*55ff468fSLeonardo Garcia``ibm,drc-names``
40*55ff468fSLeonardo Garcia-----------------
41*55ff468fSLeonardo Garcia
42*55ff468fSLeonardo Garcia  First 4-bytes: big-endian (BE) encoded integer denoting the number of entries.
43*55ff468fSLeonardo Garcia
44*55ff468fSLeonardo Garcia  Each entry: a NULL-terminated ``<name>`` string encoded as a byte array.
45*55ff468fSLeonardo Garcia
46*55ff468fSLeonardo Garcia    ``<name>`` values for logical/virtual resources are defined in the Linux on
47*55ff468fSLeonardo Garcia    Power Architecture Reference ([LoPAR]_) section 13.5.2.4, and basically
48*55ff468fSLeonardo Garcia    consist of the type of the resource followed by a space and a numerical
49*55ff468fSLeonardo Garcia    value that's unique across resources of that type.
50*55ff468fSLeonardo Garcia
51*55ff468fSLeonardo Garcia    ``<name>`` values for "physical" resources such as PCI or VIO devices are
52*55ff468fSLeonardo Garcia    defined as being "location codes", which are the "location labels" of each
53*55ff468fSLeonardo Garcia    encapsulating device, starting from the chassis down to the individual slot
54*55ff468fSLeonardo Garcia    for the device, concatenated by a hyphen. This provides a mapping of
55*55ff468fSLeonardo Garcia    resources to a physical location in a chassis for debugging purposes. For
56*55ff468fSLeonardo Garcia    QEMU, this mapping is less important, so we assign a location code that
57*55ff468fSLeonardo Garcia    conforms to naming specifications, but is simply a location label for the
58*55ff468fSLeonardo Garcia    slot by itself to simplify the implementation. The naming convention for
59*55ff468fSLeonardo Garcia    location labels is documented in detail in the [LoPAR]_ section 12.3.1.5,
60*55ff468fSLeonardo Garcia    and in our case amounts to using ``C<n>`` for PCI/VIO device slots, where
61*55ff468fSLeonardo Garcia    ``<n>`` is unique across all PCI/VIO device slots.
62*55ff468fSLeonardo Garcia
63*55ff468fSLeonardo Garcia``ibm,drc-indexes``
64*55ff468fSLeonardo Garcia-------------------
65*55ff468fSLeonardo Garcia
66*55ff468fSLeonardo Garcia  First 4-bytes: BE-encoded integer denoting the number of entries.
67*55ff468fSLeonardo Garcia
68*55ff468fSLeonardo Garcia  Each 4-byte entry: BE-encoded ``<index>`` integer that is unique across all
69*55ff468fSLeonardo Garcia  DRCs in the machine.
70*55ff468fSLeonardo Garcia
71*55ff468fSLeonardo Garcia    ``<index>`` is arbitrary, but in the case of QEMU we try to maintain the
72*55ff468fSLeonardo Garcia    convention used to assign them to pSeries guests on pHyp (the hypervisor
73*55ff468fSLeonardo Garcia    portion of PowerVM):
74*55ff468fSLeonardo Garcia
75*55ff468fSLeonardo Garcia      ``bit[31:28]``: integer encoding of ``<type>``, where ``<type>`` is:
76*55ff468fSLeonardo Garcia
77*55ff468fSLeonardo Garcia        ``1`` for CPU resource.
78*55ff468fSLeonardo Garcia
79*55ff468fSLeonardo Garcia        ``2`` for PHB resource.
80*55ff468fSLeonardo Garcia
81*55ff468fSLeonardo Garcia        ``3`` for VIO resource.
82*55ff468fSLeonardo Garcia
83*55ff468fSLeonardo Garcia        ``4`` for PCI resource.
84*55ff468fSLeonardo Garcia
85*55ff468fSLeonardo Garcia        ``8`` for memory resource.
86*55ff468fSLeonardo Garcia
87*55ff468fSLeonardo Garcia      ``bit[27:0]``: integer encoding of ``<id>``, where ``<id>`` is unique
88*55ff468fSLeonardo Garcia      across all resources of specified type.
89*55ff468fSLeonardo Garcia
90*55ff468fSLeonardo Garcia``ibm,drc-power-domains``
91*55ff468fSLeonardo Garcia-------------------------
92*55ff468fSLeonardo Garcia
93*55ff468fSLeonardo Garcia  First 4-bytes: BE-encoded integer denoting the number of entries.
94*55ff468fSLeonardo Garcia
95*55ff468fSLeonardo Garcia  Each 4-byte entry: 32-bit, BE-encoded ``<index>`` integer that specifies the
96*55ff468fSLeonardo Garcia  power domain the resource will be assigned to. In the case of QEMU we
97*55ff468fSLeonardo Garcia  associated all resources with a "live insertion" domain, where the power is
98*55ff468fSLeonardo Garcia  assumed to be managed automatically. The integer value for this domain is a
99*55ff468fSLeonardo Garcia  special value of ``-1``.
100*55ff468fSLeonardo Garcia
101*55ff468fSLeonardo Garcia
102*55ff468fSLeonardo Garcia``ibm,drc-types``
103*55ff468fSLeonardo Garcia-----------------
104*55ff468fSLeonardo Garcia
105*55ff468fSLeonardo Garcia  First 4-bytes: BE-encoded integer denoting the number of entries.
106*55ff468fSLeonardo Garcia
107*55ff468fSLeonardo Garcia  Each entry: a NULL-terminated ``<type>`` string encoded as a byte array.
108*55ff468fSLeonardo Garcia  ``<type>`` is assigned as follows:
109*55ff468fSLeonardo Garcia
110*55ff468fSLeonardo Garcia    "CPU" for a CPU.
111*55ff468fSLeonardo Garcia
112*55ff468fSLeonardo Garcia    "PHB" for a physical host-bridge.
113*55ff468fSLeonardo Garcia
114*55ff468fSLeonardo Garcia    "SLOT" for a VIO slot.
115*55ff468fSLeonardo Garcia
116*55ff468fSLeonardo Garcia    "28" for a PCI slot.
117*55ff468fSLeonardo Garcia
118*55ff468fSLeonardo Garcia    "MEM" for memory resource.
119*55ff468fSLeonardo Garcia
120*55ff468fSLeonardo Garcia.. _guest-host-interface:
121*55ff468fSLeonardo Garcia
122*55ff468fSLeonardo GarciaGuest->Host interface to manage dynamic resources
123*55ff468fSLeonardo Garcia=================================================
124*55ff468fSLeonardo Garcia
125*55ff468fSLeonardo GarciaEach DRC is given a globally unique DRC index, and resources associated with a
126*55ff468fSLeonardo Garciaparticular DRC are configured/managed by the guest via a number of RTAS calls
127*55ff468fSLeonardo Garciawhich reference individual DRCs based on the DRC index. This can be considered
128*55ff468fSLeonardo Garciathe guest->host interface.
129*55ff468fSLeonardo Garcia
130*55ff468fSLeonardo Garcia``rtas-set-power-level``
131*55ff468fSLeonardo Garcia------------------------
132*55ff468fSLeonardo Garcia
133*55ff468fSLeonardo GarciaSet the power level for a specified power domain.
134*55ff468fSLeonardo Garcia
135*55ff468fSLeonardo Garcia  ``arg[0]``: integer identifying power domain.
136*55ff468fSLeonardo Garcia
137*55ff468fSLeonardo Garcia  ``arg[1]``: new power level for the domain, ``0-100``.
138*55ff468fSLeonardo Garcia
139*55ff468fSLeonardo Garcia  ``output[0]``: status, ``0`` on success.
140*55ff468fSLeonardo Garcia
141*55ff468fSLeonardo Garcia  ``output[1]``: power level after command.
142*55ff468fSLeonardo Garcia
143*55ff468fSLeonardo Garcia``rtas-get-power-level``
144*55ff468fSLeonardo Garcia------------------------
145*55ff468fSLeonardo Garcia
146*55ff468fSLeonardo GarciaGet the power level for a specified power domain.
147*55ff468fSLeonardo Garcia
148*55ff468fSLeonardo Garcia  ``arg[0]``: integer identifying power domain.
149*55ff468fSLeonardo Garcia
150*55ff468fSLeonardo Garcia  ``output[0]``: status, ``0`` on success.
151*55ff468fSLeonardo Garcia
152*55ff468fSLeonardo Garcia  ``output[1]``: current power level.
153*55ff468fSLeonardo Garcia
154*55ff468fSLeonardo Garcia``rtas-set-indicator``
155*55ff468fSLeonardo Garcia----------------------
156*55ff468fSLeonardo Garcia
157*55ff468fSLeonardo GarciaSet the state of an indicator or sensor.
158*55ff468fSLeonardo Garcia
159*55ff468fSLeonardo Garcia  ``arg[0]``: integer identifying sensor/indicator type.
160*55ff468fSLeonardo Garcia
161*55ff468fSLeonardo Garcia  ``arg[1]``: index of sensor, for DR-related sensors this is generally the DRC
162*55ff468fSLeonardo Garcia  index.
163*55ff468fSLeonardo Garcia
164*55ff468fSLeonardo Garcia  ``arg[2]``: desired sensor value.
165*55ff468fSLeonardo Garcia
166*55ff468fSLeonardo Garcia  ``output[0]``: status, ``0`` on success.
167*55ff468fSLeonardo Garcia
168*55ff468fSLeonardo GarciaFor the purpose of this document we focus on the indicator/sensor types
169*55ff468fSLeonardo Garciaassociated with a DRC. The types are:
170*55ff468fSLeonardo Garcia
171*55ff468fSLeonardo Garcia* ``9001``: ``isolation-state``, controls/indicates whether a device has been
172*55ff468fSLeonardo Garcia  made accessible to a guest. Supported sensor values:
173*55ff468fSLeonardo Garcia
174*55ff468fSLeonardo Garcia    ``0``: ``isolate``, device is made inaccessible by guest OS.
175*55ff468fSLeonardo Garcia
176*55ff468fSLeonardo Garcia    ``1``: ``unisolate``, device is made available to guest OS.
177*55ff468fSLeonardo Garcia
178*55ff468fSLeonardo Garcia* ``9002``: ``dr-indicator``, controls "visual" indicator associated with
179*55ff468fSLeonardo Garcia  device. Supported sensor values:
180*55ff468fSLeonardo Garcia
181*55ff468fSLeonardo Garcia    ``0``: ``inactive``, resource may be safely removed.
182*55ff468fSLeonardo Garcia
183*55ff468fSLeonardo Garcia    ``1``: ``active``, resource is in use and cannot be safely removed.
184*55ff468fSLeonardo Garcia
185*55ff468fSLeonardo Garcia    ``2``: ``identify``, used to visually identify slot for interactive hot plug.
186*55ff468fSLeonardo Garcia
187*55ff468fSLeonardo Garcia    ``3``: ``action``, in most cases, used in the same manner as identify.
188*55ff468fSLeonardo Garcia
189*55ff468fSLeonardo Garcia* ``9003``: ``allocation-state``, generally only used for "logical" DR resources
190*55ff468fSLeonardo Garcia  to request the allocation/deallocation of a resource prior to acquiring it via
191*55ff468fSLeonardo Garcia  ``isolation-state->unisolate``, or after releasing it via
192*55ff468fSLeonardo Garcia  ``isolation-state->isolate``, respectively. For "physical" DR (like PCI
193*55ff468fSLeonardo Garcia  hot plug/unplug) the pre-allocation of the resource is implied and this sensor
194*55ff468fSLeonardo Garcia  is unused. Supported sensor values:
195*55ff468fSLeonardo Garcia
196*55ff468fSLeonardo Garcia    ``0``: ``unusable``, tell firmware/system the resource can be
197*55ff468fSLeonardo Garcia    unallocated/reclaimed and added back to the system resource pool.
198*55ff468fSLeonardo Garcia
199*55ff468fSLeonardo Garcia    ``1``: ``usable``, request the resource be allocated/reserved for use by
200*55ff468fSLeonardo Garcia    guest OS.
201*55ff468fSLeonardo Garcia
202*55ff468fSLeonardo Garcia    ``2``: ``exchange``, used to allocate a spare resource to use for fail-over
203*55ff468fSLeonardo Garcia    in certain situations. Unused in QEMU.
204*55ff468fSLeonardo Garcia
205*55ff468fSLeonardo Garcia    ``3``: ``recover``, used to reclaim a previously allocated resource that's
206*55ff468fSLeonardo Garcia    not currently allocated to the guest OS. Unused in QEMU.
207*55ff468fSLeonardo Garcia
208*55ff468fSLeonardo Garcia``rtas-get-sensor-state:``
209*55ff468fSLeonardo Garcia--------------------------
210*55ff468fSLeonardo Garcia
211*55ff468fSLeonardo GarciaUsed to read an indicator or sensor value.
212*55ff468fSLeonardo Garcia
213*55ff468fSLeonardo Garcia  ``arg[0]``: integer identifying sensor/indicator type.
214*55ff468fSLeonardo Garcia
215*55ff468fSLeonardo Garcia  ``arg[1]``: index of sensor, for DR-related sensors this is generally the DRC
216*55ff468fSLeonardo Garcia  index
217*55ff468fSLeonardo Garcia
218*55ff468fSLeonardo Garcia  ``output[0]``: status, 0 on success
219*55ff468fSLeonardo Garcia
220*55ff468fSLeonardo GarciaFor DR-related operations, the only noteworthy sensor is ``dr-entity-sense``,
221*55ff468fSLeonardo Garciawhich has a type value of ``9003``, as ``allocation-state`` does in the case of
222*55ff468fSLeonardo Garcia``rtas-set-indicator``. The semantics/encodings of the sensor values are
223*55ff468fSLeonardo Garciadistinct however.
224*55ff468fSLeonardo Garcia
225*55ff468fSLeonardo GarciaSupported sensor values for ``dr-entity-sense`` (``9003``) sensor:
226*55ff468fSLeonardo Garcia
227*55ff468fSLeonardo Garcia  ``0``: empty.
228*55ff468fSLeonardo Garcia
229*55ff468fSLeonardo Garcia    For physical resources: DRC/slot is empty.
230*55ff468fSLeonardo Garcia
231*55ff468fSLeonardo Garcia    For logical resources: unused.
232*55ff468fSLeonardo Garcia
233*55ff468fSLeonardo Garcia  ``1``: present.
234*55ff468fSLeonardo Garcia
235*55ff468fSLeonardo Garcia    For physical resources: DRC/slot is populated with a device/resource.
236*55ff468fSLeonardo Garcia
237*55ff468fSLeonardo Garcia    For logical resources: resource has been allocated to the DRC.
238*55ff468fSLeonardo Garcia
239*55ff468fSLeonardo Garcia  ``2``: unusable.
240*55ff468fSLeonardo Garcia
241*55ff468fSLeonardo Garcia    For physical resources: unused.
242*55ff468fSLeonardo Garcia
243*55ff468fSLeonardo Garcia    For logical resources: DRC has no resource allocated to it.
244*55ff468fSLeonardo Garcia
245*55ff468fSLeonardo Garcia  ``3``: exchange.
246*55ff468fSLeonardo Garcia
247*55ff468fSLeonardo Garcia    For physical resources: unused.
248*55ff468fSLeonardo Garcia
249*55ff468fSLeonardo Garcia    For logical resources: resource available for exchange (see
250*55ff468fSLeonardo Garcia    ``allocation-state`` sensor semantics above).
251*55ff468fSLeonardo Garcia
252*55ff468fSLeonardo Garcia  ``4``: recovery.
253*55ff468fSLeonardo Garcia
254*55ff468fSLeonardo Garcia    For physical resources: unused.
255*55ff468fSLeonardo Garcia
256*55ff468fSLeonardo Garcia    For logical resources: resource available for recovery (see
257*55ff468fSLeonardo Garcia    ``allocation-state`` sensor semantics above).
258*55ff468fSLeonardo Garcia
259*55ff468fSLeonardo Garcia``rtas-ibm-configure-connector``
260*55ff468fSLeonardo Garcia--------------------------------
261*55ff468fSLeonardo Garcia
262*55ff468fSLeonardo GarciaUsed to fetch an OpenFirmware device tree description of the resource associated
263*55ff468fSLeonardo Garciawith a particular DRC.
264*55ff468fSLeonardo Garcia
265*55ff468fSLeonardo Garcia  ``arg[0]``: guest physical address of 4096-byte work area buffer.
266*55ff468fSLeonardo Garcia
267*55ff468fSLeonardo Garcia  ``arg[1]``: 0, or address of additional 4096-byte work area buffer; only
268*55ff468fSLeonardo Garcia  non-zero if a prior RTAS response indicated a need for additional memory.
269*55ff468fSLeonardo Garcia
270*55ff468fSLeonardo Garcia  ``output[0]``: status:
271*55ff468fSLeonardo Garcia
272*55ff468fSLeonardo Garcia    ``0``: completed transmittal of device tree node.
273*55ff468fSLeonardo Garcia
274*55ff468fSLeonardo Garcia    ``1``: instruct guest to prepare for next device tree sibling node.
275*55ff468fSLeonardo Garcia
276*55ff468fSLeonardo Garcia    ``2``: instruct guest to prepare for next device tree child node.
277*55ff468fSLeonardo Garcia
278*55ff468fSLeonardo Garcia    ``3``: instruct guest to prepare for next device tree property.
279*55ff468fSLeonardo Garcia
280*55ff468fSLeonardo Garcia    ``4``: instruct guest to ascend to parent device tree node.
281*55ff468fSLeonardo Garcia
282*55ff468fSLeonardo Garcia    ``5``: instruct guest to provide additional work-area buffer via ``arg[1]``.
283*55ff468fSLeonardo Garcia
284*55ff468fSLeonardo Garcia    ``990x``: instruct guest that operation took too long and to try again
285*55ff468fSLeonardo Garcia    later.
286*55ff468fSLeonardo Garcia
287*55ff468fSLeonardo GarciaThe DRC index is encoded in the first 4-bytes of the first work area buffer.
288*55ff468fSLeonardo GarciaWork area (``wa``) layout, using 4-byte offsets:
289*55ff468fSLeonardo Garcia
290*55ff468fSLeonardo Garcia  ``wa[0]``: DRC index of the DRC to fetch device tree nodes from.
291*55ff468fSLeonardo Garcia
292*55ff468fSLeonardo Garcia  ``wa[1]``: ``0`` (hard-coded).
293*55ff468fSLeonardo Garcia
294*55ff468fSLeonardo Garcia  ``wa[2]``:
295*55ff468fSLeonardo Garcia
296*55ff468fSLeonardo Garcia    For next-sibling/next-child response:
297*55ff468fSLeonardo Garcia
298*55ff468fSLeonardo Garcia      ``wa`` offset of null-terminated string denoting the new node's name.
299*55ff468fSLeonardo Garcia
300*55ff468fSLeonardo Garcia    For next-property response:
301*55ff468fSLeonardo Garcia
302*55ff468fSLeonardo Garcia      ``wa`` offset of null-terminated string denoting new property's name.
303*55ff468fSLeonardo Garcia
304*55ff468fSLeonardo Garcia  ``wa[3]``: for next-property response (unused otherwise):
305*55ff468fSLeonardo Garcia
306*55ff468fSLeonardo Garcia      Byte-length of new property's value.
307*55ff468fSLeonardo Garcia
308*55ff468fSLeonardo Garcia  ``wa[4]``: for next-property response (unused otherwise):
309*55ff468fSLeonardo Garcia
310*55ff468fSLeonardo Garcia      New property's value, encoded as an OFDT-compatible byte array.
311*55ff468fSLeonardo Garcia
312*55ff468fSLeonardo GarciaHot plug/unplug events
313*55ff468fSLeonardo Garcia======================
314*55ff468fSLeonardo Garcia
315*55ff468fSLeonardo GarciaFor most DR operations, the hypervisor will issue host->guest add/remove events
316*55ff468fSLeonardo Garciausing the EPOW/check-exception notification framework, where the host issues a
317*55ff468fSLeonardo Garciacheck-exception interrupt, then provides an RTAS event log via an
318*55ff468fSLeonardo Garciartas-check-exception call issued by the guest in response. This framework is
319*55ff468fSLeonardo Garciadocumented by PAPR+ v2.7, and already use in by QEMU for generating powerdown
320*55ff468fSLeonardo Garciarequests via EPOW events.
321*55ff468fSLeonardo Garcia
322*55ff468fSLeonardo GarciaFor DR, this framework has been extended to include hotplug events, which were
323*55ff468fSLeonardo Garciapreviously unneeded due to direct manipulation of DR-related guest userspace
324*55ff468fSLeonardo Garciatools by host-level management such as an HMC. This level of management is not
325*55ff468fSLeonardo Garciaapplicable to KVM on Power, hence the reason for extending the notification
326*55ff468fSLeonardo Garciaframework to support hotplug events.
327*55ff468fSLeonardo Garcia
328*55ff468fSLeonardo GarciaThe format for these EPOW-signalled events is described below under
329*55ff468fSLeonardo Garcia:ref:`hot-plug-unplug-event-structure`. Note that these events are not formally
330*55ff468fSLeonardo Garciapart of the PAPR+ specification, and have been superseded by a newer format,
331*55ff468fSLeonardo Garciaalso described below under :ref:`hot-plug-unplug-event-structure`, and so are
332*55ff468fSLeonardo Garcianow deemed a "legacy" format. The formats are similar, but the "modern" format
333*55ff468fSLeonardo Garciacontains additional fields/flags, which are denoted for the purposes of this
334*55ff468fSLeonardo Garciadocumentation with ``#ifdef GUEST_SUPPORTS_MODERN`` guards.
335*55ff468fSLeonardo Garcia
336*55ff468fSLeonardo GarciaQEMU should assume support only for "legacy" fields/flags unless the guest
337*55ff468fSLeonardo Garciaadvertises support for the "modern" format via
338*55ff468fSLeonardo Garcia``ibm,client-architecture-support`` hcall by setting byte 5, bit 6 of it's
339*55ff468fSLeonardo Garcia``ibm,architecture-vec-5`` option vector structure (as described by [LoPAR]_,
340*55ff468fSLeonardo Garciasection B.5.2.3). As with "legacy" format events, "modern" format events are
341*55ff468fSLeonardo Garciasurfaced to the guest via check-exception RTAS calls, but use a dedicated event
342*55ff468fSLeonardo Garciasource to signal the guest. This event source is advertised to the guest by the
343*55ff468fSLeonardo Garciaaddition of a ``hot-plug-events`` node under ``/event-sources`` node of the
344*55ff468fSLeonardo Garciaguest's device tree using the standard format described in [LoPAR]_,
345*55ff468fSLeonardo Garciasection B.5.12.2.
346*55ff468fSLeonardo Garcia
347*55ff468fSLeonardo Garcia.. _hot-plug-unplug-event-structure:
348*55ff468fSLeonardo Garcia
349*55ff468fSLeonardo GarciaHot plug/unplug event structure
350*55ff468fSLeonardo Garcia===============================
351*55ff468fSLeonardo Garcia
352*55ff468fSLeonardo GarciaThe hot plug specific payload in QEMU is implemented as follows (with all values
353*55ff468fSLeonardo Garciaencoded in big-endian format):
354*55ff468fSLeonardo Garcia
355*55ff468fSLeonardo Garcia.. code-block:: c
356*55ff468fSLeonardo Garcia
357*55ff468fSLeonardo Garcia   struct rtas_event_log_v6_hp {
358*55ff468fSLeonardo Garcia   #define SECTION_ID_HOTPLUG              0x4850 /* HP */
359*55ff468fSLeonardo Garcia       struct section_header {
360*55ff468fSLeonardo Garcia           uint16_t section_id;            /* set to SECTION_ID_HOTPLUG */
361*55ff468fSLeonardo Garcia           uint16_t section_length;        /* sizeof(rtas_event_log_v6_hp),
362*55ff468fSLeonardo Garcia                                            * plus the length of the DRC name
363*55ff468fSLeonardo Garcia                                            * if a DRC name identifier is
364*55ff468fSLeonardo Garcia                                            * specified for hotplug_identifier
365*55ff468fSLeonardo Garcia                                            */
366*55ff468fSLeonardo Garcia           uint8_t section_version;        /* version 1 */
367*55ff468fSLeonardo Garcia           uint8_t section_subtype;        /* unused */
368*55ff468fSLeonardo Garcia           uint16_t creator_component_id;  /* unused */
369*55ff468fSLeonardo Garcia       } hdr;
370*55ff468fSLeonardo Garcia   #define RTAS_LOG_V6_HP_TYPE_CPU         1
371*55ff468fSLeonardo Garcia   #define RTAS_LOG_V6_HP_TYPE_MEMORY      2
372*55ff468fSLeonardo Garcia   #define RTAS_LOG_V6_HP_TYPE_SLOT        3
373*55ff468fSLeonardo Garcia   #define RTAS_LOG_V6_HP_TYPE_PHB         4
374*55ff468fSLeonardo Garcia   #define RTAS_LOG_V6_HP_TYPE_PCI         5
375*55ff468fSLeonardo Garcia       uint8_t hotplug_type;               /* type of resource/device */
376*55ff468fSLeonardo Garcia   #define RTAS_LOG_V6_HP_ACTION_ADD       1
377*55ff468fSLeonardo Garcia   #define RTAS_LOG_V6_HP_ACTION_REMOVE    2
378*55ff468fSLeonardo Garcia       uint8_t hotplug_action;             /* action (add/remove) */
379*55ff468fSLeonardo Garcia   #define RTAS_LOG_V6_HP_ID_DRC_NAME          1
380*55ff468fSLeonardo Garcia   #define RTAS_LOG_V6_HP_ID_DRC_INDEX         2
381*55ff468fSLeonardo Garcia   #define RTAS_LOG_V6_HP_ID_DRC_COUNT         3
382*55ff468fSLeonardo Garcia   #ifdef GUEST_SUPPORTS_MODERN
383*55ff468fSLeonardo Garcia   #define RTAS_LOG_V6_HP_ID_DRC_COUNT_INDEXED 4
384*55ff468fSLeonardo Garcia   #endif
385*55ff468fSLeonardo Garcia       uint8_t hotplug_identifier;         /* type of the resource identifier,
386*55ff468fSLeonardo Garcia                                            * which serves as the discriminator
387*55ff468fSLeonardo Garcia                                            * for the 'drc' union field below
388*55ff468fSLeonardo Garcia                                            */
389*55ff468fSLeonardo Garcia   #ifdef GUEST_SUPPORTS_MODERN
390*55ff468fSLeonardo Garcia       uint8_t capabilities;               /* capability flags, currently unused
391*55ff468fSLeonardo Garcia                                            * by QEMU
392*55ff468fSLeonardo Garcia                                            */
393*55ff468fSLeonardo Garcia   #else
394*55ff468fSLeonardo Garcia       uint8_t reserved;
395*55ff468fSLeonardo Garcia   #endif
396*55ff468fSLeonardo Garcia       union {
397*55ff468fSLeonardo Garcia           uint32_t index;                 /* DRC index of resource to take action
398*55ff468fSLeonardo Garcia                                            * on
399*55ff468fSLeonardo Garcia                                            */
400*55ff468fSLeonardo Garcia           uint32_t count;                 /* number of DR resources to take
401*55ff468fSLeonardo Garcia                                            * action on (guest chooses which)
402*55ff468fSLeonardo Garcia                                            */
403*55ff468fSLeonardo Garcia   #ifdef GUEST_SUPPORTS_MODERN
404*55ff468fSLeonardo Garcia           struct {
405*55ff468fSLeonardo Garcia               uint32_t count;             /* number of DR resources to take
406*55ff468fSLeonardo Garcia                                            * action on
407*55ff468fSLeonardo Garcia                                            */
408*55ff468fSLeonardo Garcia               uint32_t index;             /* DRC index of first resource to take
409*55ff468fSLeonardo Garcia                                            * action on. guest will take action
410*55ff468fSLeonardo Garcia                                            * on DRC index <index> through
411*55ff468fSLeonardo Garcia                                            * DRC index <index + count - 1> in
412*55ff468fSLeonardo Garcia                                            * sequential order
413*55ff468fSLeonardo Garcia                                            */
414*55ff468fSLeonardo Garcia           } count_indexed;
415*55ff468fSLeonardo Garcia   #endif
416*55ff468fSLeonardo Garcia           char name[1];                   /* string representing the name of the
417*55ff468fSLeonardo Garcia                                            * DRC to take action on
418*55ff468fSLeonardo Garcia                                            */
419*55ff468fSLeonardo Garcia       } drc;
420*55ff468fSLeonardo Garcia   } QEMU_PACKED;
421*55ff468fSLeonardo Garcia
422*55ff468fSLeonardo Garcia``ibm,lrdr-capacity``
423*55ff468fSLeonardo Garcia=====================
424*55ff468fSLeonardo Garcia
425*55ff468fSLeonardo Garcia``ibm,lrdr-capacity`` is a property in the /rtas device tree node that
426*55ff468fSLeonardo Garciaidentifies the dynamic reconfiguration capabilities of the guest. It consists
427*55ff468fSLeonardo Garciaof a triple consisting of ``<phys>``, ``<size>`` and ``<maxcpus>``.
428*55ff468fSLeonardo Garcia
429*55ff468fSLeonardo Garcia  ``<phys>``, encoded in BE format represents the maximum address in bytes and
430*55ff468fSLeonardo Garcia  hence the maximum memory that can be allocated to the guest.
431*55ff468fSLeonardo Garcia
432*55ff468fSLeonardo Garcia  ``<size>``, encoded in BE format represents the size increments in which
433*55ff468fSLeonardo Garcia  memory can be hot-plugged to the guest.
434*55ff468fSLeonardo Garcia
435*55ff468fSLeonardo Garcia  ``<maxcpus>``, a BE-encoded integer, represents the maximum number of
436*55ff468fSLeonardo Garcia  processors that the guest can have.
437*55ff468fSLeonardo Garcia
438*55ff468fSLeonardo Garcia``pseries`` guests use this property to note the maximum allowed CPUs for the
439*55ff468fSLeonardo Garciaguest.
440*55ff468fSLeonardo Garcia
441*55ff468fSLeonardo Garcia``ibm,dynamic-reconfiguration-memory``
442*55ff468fSLeonardo Garcia======================================
443*55ff468fSLeonardo Garcia
444*55ff468fSLeonardo Garcia``ibm,dynamic-reconfiguration-memory`` is a device tree node that represents
445*55ff468fSLeonardo Garciadynamically reconfigurable logical memory blocks (LMB). This node is generated
446*55ff468fSLeonardo Garciaonly when the guest advertises the support for it via
447*55ff468fSLeonardo Garcia``ibm,client-architecture-support`` call. Memory that is not dynamically
448*55ff468fSLeonardo Garciareconfigurable is represented by ``/memory`` nodes. The properties of this node
449*55ff468fSLeonardo Garciathat are of interest to the sPAPR memory hotplug implementation in QEMU are
450*55ff468fSLeonardo Garciadescribed here.
451*55ff468fSLeonardo Garcia
452*55ff468fSLeonardo Garcia``ibm,lmb-size``
453*55ff468fSLeonardo Garcia----------------
454*55ff468fSLeonardo Garcia
455*55ff468fSLeonardo GarciaThis 64-bit integer defines the size of each dynamically reconfigurable LMB.
456*55ff468fSLeonardo Garcia
457*55ff468fSLeonardo Garcia``ibm,associativity-lookup-arrays``
458*55ff468fSLeonardo Garcia-----------------------------------
459*55ff468fSLeonardo Garcia
460*55ff468fSLeonardo GarciaThis property defines a lookup array in which the NUMA associativity
461*55ff468fSLeonardo Garciainformation for each LMB can be found. It is a property encoded array
462*55ff468fSLeonardo Garciathat begins with an integer M, the number of associativity lists followed
463*55ff468fSLeonardo Garciaby an integer N, the number of entries per associativity list and terminated
464*55ff468fSLeonardo Garciaby M associativity lists each of length N integers.
465*55ff468fSLeonardo Garcia
466*55ff468fSLeonardo GarciaThis property provides the same information as given by ``ibm,associativity``
467*55ff468fSLeonardo Garciaproperty in a ``/memory`` node. Each assigned LMB has an index value between
468*55ff468fSLeonardo Garcia0 and M-1 which is used as an index into this table to select which
469*55ff468fSLeonardo Garciaassociativity list to use for the LMB. This index value for each LMB is defined
470*55ff468fSLeonardo Garciain ``ibm,dynamic-memory`` property.
471*55ff468fSLeonardo Garcia
472*55ff468fSLeonardo Garcia``ibm,dynamic-memory``
473*55ff468fSLeonardo Garcia----------------------
474*55ff468fSLeonardo Garcia
475*55ff468fSLeonardo GarciaThis property describes the dynamically reconfigurable memory. It is a
476*55ff468fSLeonardo Garciaproperty encoded array that has an integer N, the number of LMBs followed
477*55ff468fSLeonardo Garciaby N LMB list entries.
478*55ff468fSLeonardo Garcia
479*55ff468fSLeonardo GarciaEach LMB list entry consists of the following elements:
480*55ff468fSLeonardo Garcia
481*55ff468fSLeonardo Garcia- Logical address of the start of the LMB encoded as a 64-bit integer. This
482*55ff468fSLeonardo Garcia  corresponds to ``reg`` property in ``/memory`` node.
483*55ff468fSLeonardo Garcia- DRC index of the LMB that corresponds to ``ibm,my-drc-index`` property
484*55ff468fSLeonardo Garcia  in a ``/memory`` node.
485*55ff468fSLeonardo Garcia- Four bytes reserved for expansion.
486*55ff468fSLeonardo Garcia- Associativity list index for the LMB that is used as an index into
487*55ff468fSLeonardo Garcia  ``ibm,associativity-lookup-arrays`` property described earlier. This is used
488*55ff468fSLeonardo Garcia  to retrieve the right associativity list to be used for this LMB.
489*55ff468fSLeonardo Garcia- A 32-bit flags word. The bit at bit position ``0x00000008`` defines whether
490*55ff468fSLeonardo Garcia  the LMB is assigned to the partition as of boot time.
491*55ff468fSLeonardo Garcia
492*55ff468fSLeonardo Garcia``ibm,dynamic-memory-v2``
493*55ff468fSLeonardo Garcia-------------------------
494*55ff468fSLeonardo Garcia
495*55ff468fSLeonardo GarciaThis property describes the dynamically reconfigurable memory. This is
496*55ff468fSLeonardo Garciaan alternate and newer way to describe dynamically reconfigurable memory.
497*55ff468fSLeonardo GarciaIt is a property encoded array that has an integer N (the number of
498*55ff468fSLeonardo GarciaLMB set entries) followed by N LMB set entries. There is an LMB set entry
499*55ff468fSLeonardo Garciafor each sequential group of LMBs that share common attributes.
500*55ff468fSLeonardo Garcia
501*55ff468fSLeonardo GarciaEach LMB set entry consists of the following elements:
502*55ff468fSLeonardo Garcia
503*55ff468fSLeonardo Garcia- Number of sequential LMBs in the entry represented by a 32-bit integer.
504*55ff468fSLeonardo Garcia- Logical address of the first LMB in the set encoded as a 64-bit integer.
505*55ff468fSLeonardo Garcia- DRC index of the first LMB in the set.
506*55ff468fSLeonardo Garcia- Associativity list index that is used as an index into
507*55ff468fSLeonardo Garcia  ``ibm,associativity-lookup-arrays`` property described earlier. This
508*55ff468fSLeonardo Garcia  is used to retrieve the right associativity list to be used for all
509*55ff468fSLeonardo Garcia  the LMBs in this set.
510*55ff468fSLeonardo Garcia- A 32-bit flags word that applies to all the LMBs in the set.
511