74e2845c | 07-Mar-2024 |
Jonathan Cameron <Jonathan.Cameron@huawei.com> |
hmat acpi: Fix out of bounds access due to missing use of indirection
With a numa set up such as
-numa nodeid=0,cpus=0 \ -numa nodeid=1,memdev=mem \ -numa nodeid=2,cpus=1
and appropriate hmat_lb e
hmat acpi: Fix out of bounds access due to missing use of indirection
With a numa set up such as
-numa nodeid=0,cpus=0 \ -numa nodeid=1,memdev=mem \ -numa nodeid=2,cpus=1
and appropriate hmat_lb entries the initiator list is correctly computed and writen to HMAT as 0,2 but then the LB data is accessed using the node id (here 2), landing outside the entry_list array.
Stash the reverse lookup when writing the initiator list and use it to get the correct array index index.
Fixes: 4586a2cb83 ("hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)") Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240307160326.31570-3-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
2eb6672c | 07-Mar-2024 |
Jonathan Cameron <Jonathan.Cameron@huawei.com> |
hmat acpi: Do not add Memory Proximity Domain Attributes Structure targetting non existent memory.
If qemu is started with a proximity node containing CPUs alone, it will provide one of these struct
hmat acpi: Do not add Memory Proximity Domain Attributes Structure targetting non existent memory.
If qemu is started with a proximity node containing CPUs alone, it will provide one of these structures to say memory in this node is directly connected to itself.
This description is arguably pointless even if there is memory in the node. If there is no memory present, and hence no SRAT entry it breaks Linux HMAT passing and the table is rejected.
https://elixir.bootlin.com/linux/v6.7/source/drivers/acpi/numa/hmat.c#L444
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240307160326.31570-2-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
0a5b5acd | 08-Mar-2024 |
Ankit Agrawal <ankita@nvidia.com> |
hw/acpi: Implement the SRAT GI affinity structure
ACPI spec provides a scheme to associate "Generic Initiators" [1] (e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with integr
hw/acpi: Implement the SRAT GI affinity structure
ACPI spec provides a scheme to associate "Generic Initiators" [1] (e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with integrated compute or DMA engines GPUs) with Proximity Domains. This is achieved using Generic Initiator Affinity Structure in SRAT. During bootup, Linux kernel parse the ACPI SRAT to determine the PXM ids and create a NUMA node for each unique PXM ID encountered. Qemu currently do not implement these structures while building SRAT.
Add GI structures while building VM ACPI SRAT. The association between device and node are stored using acpi-generic-initiator object. Lookup presence of all such objects and use them to build these structures.
The structure needs a PCI device handle [2] that consists of the device BDF. The vfio-pci device corresponding to the acpi-generic-initiator object is located to determine the BDF.
[1] ACPI Spec 6.3, Section 5.2.16.6 [2] ACPI Spec 6.3, Table 5.80
Cc: Jonathan Cameron <qemu-devel@nongnu.org> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cedric Le Goater <clg@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Message-Id: <20240308145525.10886-3-ankita@nvidia.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
b64b7ed8 | 08-Mar-2024 |
Ankit Agrawal <ankita@nvidia.com> |
qom: new object to associate device to NUMA node
NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows partitioning of the GPU device resources (including device memory) into sever
qom: new object to associate device to NUMA node
NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows partitioning of the GPU device resources (including device memory) into several (upto 8) isolated instances. Each of the partitioned memory needs a dedicated NUMA node to operate. The partitions are not fixed and they can be created/deleted at runtime.
Unfortunately Linux OS does not provide a means to dynamically create/destroy NUMA nodes and such feature implementation is not expected to be trivial. The nodes that OS discovers at the boot time while parsing SRAT remains fixed. So we utilize the Generic Initiator (GI) Affinity structures that allows association between nodes and devices. Multiple GI structures per BDF is possible, allowing creation of multiple nodes by exposing unique PXM in each of these structures.
Implement the mechanism to build the GI affinity structures as Qemu currently does not. Introduce a new acpi-generic-initiator object to allow host admin link a device with an associated NUMA node. Qemu maintains this association and use this object to build the requisite GI Affinity Structure.
When multiple NUMA nodes are associated with a device, it is required to create those many number of acpi-generic-initiator objects, each representing a unique device:node association.
Following is one of a decoded GI affinity structure in VM ACPI SRAT. [0C8h 0200 1] Subtable Type : 05 [Generic Initiator Affinity] [0C9h 0201 1] Length : 20
[0CAh 0202 1] Reserved1 : 00 [0CBh 0203 1] Device Handle Type : 01 [0CCh 0204 4] Proximity Domain : 00000007 [0D0h 0208 16] Device Handle : 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 [0E0h 0224 4] Flags (decoded below) : 00000001 Enabled : 1 [0E4h 0228 4] Reserved2 : 00000000
[0E8h 0232 1] Subtable Type : 05 [Generic Initiator Affinity] [0E9h 0233 1] Length : 20
An admin can provide a range of acpi-generic-initiator objects, each associating a device (by providing the id through pci-dev argument) to the desired NUMA node (using the node argument). Currently, only PCI device is supported.
For the grace hopper system, create a range of 8 nodes and associate that with the device using the acpi-generic-initiator object. While a configuration of less than 8 nodes per device is allowed, such configuration will prevent utilization of the feature to the fullest. The following sample creates 8 nodes per PCI device for a VM with 2 PCI devices and link them to the respecitve PCI device using acpi-generic-initiator objects:
-numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \ -numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 \ -numa node,nodeid=8 -numa node,nodeid=9 \ -device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \ -object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \ -object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=3 \ -object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=4 \ -object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \ -object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=6 \ -object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \ -object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \ -object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \
-numa node,nodeid=10 -numa node,nodeid=11 -numa node,nodeid=12 \ -numa node,nodeid=13 -numa node,nodeid=14 -numa node,nodeid=15 \ -numa node,nodeid=16 -numa node,nodeid=17 \ -device vfio-pci-nohotplug,host=0009:01:01.0,bus=pcie.0,addr=05.0,rombar=0,id=dev1 \ -object acpi-generic-initiator,id=gi8,pci-dev=dev1,node=10 \ -object acpi-generic-initiator,id=gi9,pci-dev=dev1,node=11 \ -object acpi-generic-initiator,id=gi10,pci-dev=dev1,node=12 \ -object acpi-generic-initiator,id=gi11,pci-dev=dev1,node=13 \ -object acpi-generic-initiator,id=gi12,pci-dev=dev1,node=14 \ -object acpi-generic-initiator,id=gi13,pci-dev=dev1,node=15 \ -object acpi-generic-initiator,id=gi14,pci-dev=dev1,node=16 \ -object acpi-generic-initiator,id=gi15,pci-dev=dev1,node=17 \
Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu [1] Cc: Jonathan Cameron <qemu-devel@nongnu.org> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Message-Id: <20240308145525.10886-2-ankita@nvidia.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
c461f3e3 | 08-Sep-2023 |
Bernhard Beschow <shentey@gmail.com> |
hw/acpi/acpi_dev_interface: Remove now unused madt_cpu virtual method
This virtual method was always set to the x86-specific pc_madt_cpu_entry(), even in piix4 which is also used in MIPS. The previo
hw/acpi/acpi_dev_interface: Remove now unused madt_cpu virtual method
This virtual method was always set to the x86-specific pc_madt_cpu_entry(), even in piix4 which is also used in MIPS. The previous changes use pc_madt_cpu_entry() otherwise, so madt_cpu can be dropped.
Since pc_madt_cpu_entry() is now only used in x86-specific code, the stub in hw/acpi/acpi-x86-stub can be removed as well.
Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230908084234.17642-4-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
c28db9e0 | 20-Apr-2023 |
Jonathan Cameron <Jonathan.Cameron@huawei.com> |
hw/pci-bridge: Make PCIe and CXL PXB Devices inherit from TYPE_PXB_DEV
Previously, PXB_CXL_DEVICE, PXB_PCIE_DEVICE and PXB_DEVICE all have PCI_DEVICE as their direct parent but share a common state
hw/pci-bridge: Make PCIe and CXL PXB Devices inherit from TYPE_PXB_DEV
Previously, PXB_CXL_DEVICE, PXB_PCIE_DEVICE and PXB_DEVICE all have PCI_DEVICE as their direct parent but share a common state struct PXBDev. convert_to_pxb() is used to get the PXBDev instance from which ever of these types it is called on.
This patch switches to an explicit hierarchy based on shared functionality. To allow use of OBJECT_DECLARE_SIMPLE_TYPE() whilst minimizing code changes, all types are renamed to have the postfix _DEV rather than _DEVICE. The new heirarchy has PXB_CXL_DEV with parent PXB_PCIE_DEV which in turn has parent PXB_DEV which continues to have parent PCI_DEVICE.
This allows simple use of PXB_DEV() etc rather than a custom function + removal of duplicated properties and moving the CXL specific elements out of struct PXBDev.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230420142750.6950-3-Jonathan.Cameron@huawei.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|