#
44fa20c9 |
| 18-Sep-2023 |
Cédric Le Goater <clg@redhat.com> |
spapr: Remove support for NVIDIA V100 GPU with NVLink2
NVLink2 support was removed from the PPC PowerNV platform and VFIO in Linux 5.13 with commits :
562d1e207d32 ("powerpc/powernv: remove the n
spapr: Remove support for NVIDIA V100 GPU with NVLink2
NVLink2 support was removed from the PPC PowerNV platform and VFIO in Linux 5.13 with commits :
562d1e207d32 ("powerpc/powernv: remove the nvlink support") b392a1989170 ("vfio/pci: remove vfio_pci_nvlink2")
This was 2.5 years ago. Do the same in QEMU with a revert of commit ec132efaa81f ("spapr: Support NVIDIA V100 GPU with NVLink2"). Some adjustements are required on the NUMA part.
Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Message-ID: <20230918091717.149950-1-clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
show more ...
|
#
44fa20c9 |
| 18-Sep-2023 |
Cédric Le Goater <clg@redhat.com> |
spapr: Remove support for NVIDIA V100 GPU with NVLink2
NVLink2 support was removed from the PPC PowerNV platform and VFIO in Linux 5.13 with commits :
562d1e207d32 ("powerpc/powernv: remove the n
spapr: Remove support for NVIDIA V100 GPU with NVLink2
NVLink2 support was removed from the PPC PowerNV platform and VFIO in Linux 5.13 with commits :
562d1e207d32 ("powerpc/powernv: remove the nvlink support") b392a1989170 ("vfio/pci: remove vfio_pci_nvlink2")
This was 2.5 years ago. Do the same in QEMU with a revert of commit ec132efaa81f ("spapr: Support NVIDIA V100 GPU with NVLink2"). Some adjustements are required on the NUMA part.
Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Message-ID: <20230918091717.149950-1-clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
show more ...
|
Revision tags: v8.1.0, v8.1.0-rc4, v8.1.0-rc3, v7.2.5, v8.0.4, v8.1.0-rc2, v8.1.0-rc1, v8.1.0-rc0, v8.0.3, v7.2.4, v8.0.2, v8.0.1, v7.2.3, v7.2.2, v8.0.0, v8.0.0-rc4, v8.0.0-rc3, v7.2.1, v8.0.0-rc2, v8.0.0-rc1, v8.0.0-rc0, v7.2.0, v7.2.0-rc4, v7.2.0-rc3, v7.2.0-rc2, v7.2.0-rc1, v7.2.0-rc0, v7.1.0, v7.1.0-rc4, v7.1.0-rc3, v7.1.0-rc2, v7.1.0-rc1, v7.1.0-rc0, v7.0.0, v7.0.0-rc4, v7.0.0-rc3, v7.0.0-rc2 |
|
#
0f9668e0 |
| 23-Mar-2022 |
Marc-André Lureau <marcandre.lureau@redhat.com> |
Remove qemu-common.h include from most units
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com> Signed-off-by: Paolo B
Remove qemu-common.h include from most units
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
Revision tags: v7.0.0-rc1, v7.0.0-rc0 |
|
#
b21e2380 |
| 15-Mar-2022 |
Markus Armbruster <armbru@redhat.com> |
Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, i
Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void *, which lets the compiler catch more type errors.
This commit only touches allocations with size arguments of the form sizeof(T).
Patch created mechanically with:
$ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \ --macro-file scripts/cocci-macro-file.h FILES...
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20220315144156.1595462-4-armbru@redhat.com> Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
show more ...
|
#
16282937 |
| 02-Mar-2022 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
hw/ppc/spapr_numa.c: simplify spapr_numa_write_assoc_lookup_arrays()
We can get the job done in spapr_numa_write_assoc_lookup_arrays() a bit cleaner:
- 'cur_index = int_buf = g_malloc0(..)' is doin
hw/ppc/spapr_numa.c: simplify spapr_numa_write_assoc_lookup_arrays()
We can get the job done in spapr_numa_write_assoc_lookup_arrays() a bit cleaner:
- 'cur_index = int_buf = g_malloc0(..)' is doing a g_malloc0() in the 'int_buf' pointer and making 'cur_index' point to 'int_buf' all in a single line. No problem with that, but splitting into 2 lines is clearer to follow
- use g_autofree in 'int_buf' to avoid a g_free() call later on
- 'buf_len' is only being used to store the size of 'int_buf' malloc. Remove the var and just use the value in g_malloc0() directly
- remove the 'ret' var and just return the result of fdt_setprop()
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20220228175004.8862-12-danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
show more ...
|
Revision tags: v6.1.1, v6.2.0, v6.2.0-rc4, v6.2.0-rc3, v6.2.0-rc2, v6.2.0-rc1 |
|
#
1fde73bc |
| 10-Nov-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: fix FORM1 distance-less nodes
Commit 71e6fae3a99 fixed an issue with FORM2 affinity guests with NUMA nodes in which the distance info is absent in machine_state->numa_state->nodes. Thi
spapr_numa.c: fix FORM1 distance-less nodes
Commit 71e6fae3a99 fixed an issue with FORM2 affinity guests with NUMA nodes in which the distance info is absent in machine_state->numa_state->nodes. This happens when QEMU adds a default NUMA node and when the user adds NUMA nodes without specifying the distances.
During the discussions of the forementioned patch [1] it was found that FORM1 guests were behaving in a strange way in the same scenario, with the kernel seeing the distances between the nodes as '160', as we can see in this example with 4 NUMA nodes without distance information:
$ numactl -H available: 4 nodes (0-3) (...) node distances: node 0 1 2 3 0: 10 160 160 160 1: 160 10 160 160 2: 160 160 10 160 3: 160 160 160 10
Turns out that we have the same problem with FORM1 guests - we are calculating associativity domain using zeroed values. And as it also turns out, the solution from 71e6fae3a99 applies to FORM1 as well.
This patch creates a wrapper called 'get_numa_distance' that contains the logic used in FORM2 to define node distances when this information is absent. This helper is then used in all places where we need to read distance information from machine_state->numa_state->nodes. That way we'll guarantee that the NUMA node distance is always being curated before being used.
After this patch, the FORM1 guest mentioned above will have the following topology:
$ numactl -H available: 4 nodes (0-3) (...) node distances: node 0 1 2 3 0: 10 20 20 20 1: 20 10 20 20 2: 20 20 10 20 3: 20 20 20 10
This is compatible with what FORM2 guests and other archs do in this case.
[1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg01960.html
Fixes: 690fbe4295d5 ("spapr_numa: consider user input when defining associativity") CC: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> CC: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
show more ...
|
Revision tags: v6.2.0-rc0 |
|
#
71e6fae3 |
| 05-Nov-2021 |
Nicholas Piggin <npiggin@gmail.com> |
spapr_numa.c: FORM2 table handle nodes with no distance info
A configuration that specifies multiple nodes without distance info results in the non-local points in the FORM2 matrix having a distance
spapr_numa.c: FORM2 table handle nodes with no distance info
A configuration that specifies multiple nodes without distance info results in the non-local points in the FORM2 matrix having a distance of 0. This causes Linux to complain "Invalid distance value range" because a node distance is smaller than the local distance.
Fix this by building a simple local / remote fallback for points where distance information is missing.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20211105135137.1584840-1-npiggin@gmail.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
Revision tags: v6.0.1 |
|
#
28d86252 |
| 22-Sep-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: fixes in spapr_numa_FORM2_write_rtas_tables()
This patch has a handful of modifications for the recent added FORM2 support:
- to not allocate more than the necessary size in 'distance
spapr_numa.c: fixes in spapr_numa_FORM2_write_rtas_tables()
This patch has a handful of modifications for the recent added FORM2 support:
- to not allocate more than the necessary size in 'distance_table'. At this moment the array is oversized due to allocating uint32_t for all elements, when most of them fits in an uint8_t. Fix it by changing the array to uint8_t and allocating the exact size;
- use stl_be_p() to store the uint32_t at the start of 'distance_table';
- use sizeof(uint32_t) to skip the uint32_t length when populating the distances;
- use the NUMA_DISTANCE_MIN macro from sysemu/numa.h to avoid hardcoding the local distance value.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210922122852.130054-2-danielhb413@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
0d5ba481 |
| 20-Sep-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: handle auto NUMA node with no distance info
numa_complete_configuration() in hw/core/numa.c always adds a NUMA node for the pSeries machine if none was specified, but without node dist
spapr_numa.c: handle auto NUMA node with no distance info
numa_complete_configuration() in hw/core/numa.c always adds a NUMA node for the pSeries machine if none was specified, but without node distance information for the single node created.
NUMA FORM1 affinity code didn't rely on numa_state information to do its job, but FORM2 does. As is now, this is the result of a pSeries guest with NUMA FORM2 affinity when no NUMA nodes is specified:
$ numactl -H available: 1 nodes (0) node 0 cpus: 0 node 0 size: 16222 MB node 0 free: 15681 MB No distance information available.
This can be amended in spapr_numa_FORM2_write_rtas_tables(). We're enforcing that the local distance (the distance to the node to itself) is always 10. This allows for the proper creation of the NUMA distance tables, fixing the output of 'numactl -H' in the guest:
$ numactl -H available: 1 nodes (0) node 0 cpus: 0 node 0 size: 16222 MB node 0 free: 15685 MB node distances: node 0 0: 10
CC: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-8-danielhb413@gmail.com> Acked-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
e0eb84d4 |
| 20-Sep-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: FORM2 NUMA affinity support
The main feature of FORM2 affinity support is the separation of NUMA distances from ibm,associativity information. This allows for a more flexible and strai
spapr_numa.c: FORM2 NUMA affinity support
The main feature of FORM2 affinity support is the separation of NUMA distances from ibm,associativity information. This allows for a more flexible and straightforward NUMA distance assignment without relying on complex associations between several levels of NUMA via ibm,associativity matches. Another feature is its extensibility. This base support contains the facilities for NUMA distance assignment, but in the future more facilities will be added for latency, performance, bandwidth and so on.
This patch implements the base FORM2 affinity support as follows:
- the use of FORM2 associativity is indicated by using bit 2 of byte 5 of ibm,architecture-vec-5. A FORM2 aware guest can choose to use FORM1 or FORM2 affinity. Setting both forms will default to FORM2. We're not advertising FORM2 for pseries-6.1 and older machine versions to prevent guest visible changes in those;
- ibm,associativity-reference-points has a new semantic. Instead of being used to calculate distances via NUMA levels, it's now used to indicate the primary domain index in the ibm,associativity domain of each resource. In our case it's set to {0x4}, matching the position where we already place logical_domain_id;
- two new RTAS DT artifacts are introduced: ibm,numa-lookup-index-table and ibm,numa-distance-table. The index table is used to list all the NUMA logical domains of the platform, in ascending order, and allows for spartial NUMA configurations (although QEMU ATM doesn't support that). ibm,numa-distance-table is an array that contains all the distances from the first NUMA node to all other nodes, then the second NUMA node distances to all other nodes and so on;
- get_max_dist_ref_points(), get_numa_assoc_size() and get_associativity() now checks for OV5_FORM2_AFFINITY and returns FORM2 values if the guest selected FORM2 affinity during CAS.
Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-7-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
5dab5abe |
| 20-Sep-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr: move FORM1 verifications to post CAS
FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less) NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM device that has
spapr: move FORM1 verifications to post CAS
FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less) NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM device that has a different latency than the original NUMA node from the regular memory. FORM2 is also able to deal with asymmetric NUMA distances gracefully, something that our FORM1 implementation doesn't do.
Move these FORM1 verifications to a new function and wait until after CAS, when we're sure that we're sticking with FORM1, to enforce them.
Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-6-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
a165ac67 |
| 20-Sep-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: rename numa_assoc_array to FORM1_assoc_array
Introducing a new NUMA affinity, FORM2, requires a new mechanism to switch between affinity modes after CAS. Also, we want FORM2 data struc
spapr_numa.c: rename numa_assoc_array to FORM1_assoc_array
Introducing a new NUMA affinity, FORM2, requires a new mechanism to switch between affinity modes after CAS. Also, we want FORM2 data structures and functions to be completely separated from the existing FORM1 code, allowing us to avoid adding new code that inherits the existing complexity of FORM1.
The idea of switching values used by the write_dt() functions in spapr_numa.c was already introduced in the previous patch, and the same approach will be used when dealing with the FORM1 and FORM2 arrays.
We can accomplish that by that by renaming the existing numa_assoc_array to FORM1_assoc_array, which now is used exclusively to handle FORM1 affinity data. A new helper get_associativity() is then introduced to be used by the write_dt() functions to retrieve the current ibm,associativity array of a given node, after considering affinity selection that might have been done during CAS. All code that was using numa_assoc_array now needs to retrieve the array by calling this function.
This will allow for an easier plug of FORM2 data later on.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-5-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
3a6e4ce6 |
| 20-Sep-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: parametrize FORM1 macros
The next preliminary step to introduce NUMA FORM2 affinity is to make the existing code independent of FORM1 macros and values, i.e. MAX_DISTANCE_REF_POINTS, N
spapr_numa.c: parametrize FORM1 macros
The next preliminary step to introduce NUMA FORM2 affinity is to make the existing code independent of FORM1 macros and values, i.e. MAX_DISTANCE_REF_POINTS, NUMA_ASSOC_SIZE and VCPU_ASSOC_SIZE. This patch accomplishes that by doing the following:
- move the NUMA related macros from spapr.h to spapr_numa.c where they are used. spapr.h gets instead a 'NUMA_NODES_MAX_NUM' macro that is used to refer to the maximum number of NUMA nodes, including GPU nodes, that the machine can support;
- MAX_DISTANCE_REF_POINTS and NUMA_ASSOC_SIZE are renamed to FORM1_DIST_REF_POINTS and FORM1_NUMA_ASSOC_SIZE. These FORM1 specific macros are used in FORM1 init functions;
- code that uses MAX_DISTANCE_REF_POINTS now retrieves the max_dist_ref_points value using get_max_dist_ref_points(). NUMA_ASSOC_SIZE is replaced by get_numa_assoc_size() and VCPU_ASSOC_SIZE is replaced by get_vcpu_assoc_size(). These functions are used by the generic device tree functions and h_home_node_associativity() and will allow them to switch between FORM1 and FORM2 without changing their core logic.
Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-4-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
afa3b3c9 |
| 20-Sep-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: scrap 'legacy_numa' concept
When first introduced, 'legacy_numa' was a way to refer to guests that either wouldn't be affected by associativity domain calculations, namely the ones wit
spapr_numa.c: scrap 'legacy_numa' concept
When first introduced, 'legacy_numa' was a way to refer to guests that either wouldn't be affected by associativity domain calculations, namely the ones with only 1 NUMA node, and pre 5.2 guests that shouldn't be affected by it because it would be an userspace change. Calling these cases 'legacy_numa' was a convenient way to label these cases.
We're about to introduce a new NUMA affinity, FORM2, and this concept of 'legacy_numa' is now a bit misleading because, although it is called 'legacy' it is in fact a FORM1 exclusive contraint.
This patch removes spapr_machine_using_legacy_numa() and open code the conditions in each caller. While we're at it, move the chunk inside spapr_numa_FORM1_affinity_init() that sets all numa_assoc_array domains with 'node_id' to spapr_numa_define_FORM1_domains(). This chunk was being executed if !pre_5_2_numa_associativity and num_nodes => 1, the same conditions in which spapr_numa_define_FORM1_domains() is called shortly after.
Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
d98dbe2a |
| 20-Sep-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: split FORM1 code into helpers
The upcoming FORM2 NUMA affinity will support asymmetric NUMA topologies and doesn't need be concerned with all the legacy support for older pseries FORM1
spapr_numa.c: split FORM1 code into helpers
The upcoming FORM2 NUMA affinity will support asymmetric NUMA topologies and doesn't need be concerned with all the legacy support for older pseries FORM1 guests.
We're also not going to calculate associativity domains based on numa distance (via spapr_numa_define_associativity_domains) since the distances will be written directly into new DT properties.
Let's split FORM1 code into its own functions to allow for easier insertion of FORM2 logic later on.
Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
Revision tags: v6.1.0, v6.1.0-rc4, v6.1.0-rc3, v6.1.0-rc2, v6.1.0-rc1, v6.1.0-rc0, v6.0.0, v6.0.0-rc5, v6.0.0-rc4, v6.0.0-rc3, v6.0.0-rc2, v6.0.0-rc1, v6.0.0-rc0 |
|
#
b01fec36 |
| 28-Jan-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: fix ibm,max-associativity-domains calculation
The current logic for calculating 'maxdomain' making it a sum of numa_state->num_nodes with spapr->gpu_numa_id. spapr->gpu_numa_id is used
spapr_numa.c: fix ibm,max-associativity-domains calculation
The current logic for calculating 'maxdomain' making it a sum of numa_state->num_nodes with spapr->gpu_numa_id. spapr->gpu_numa_id is used as a index to determine the next available NUMA id that a given NVGPU can use.
The problem is that the initial value of gpu_numa_id, for any topology that has more than one NUMA node, is equal to numa_state->num_nodes. This means that our maxdomain will always be, at least, twice the amount of existing NUMA nodes. This means that a guest with 4 NUMA nodes will end up with the following max-associativity-domains:
rtas/ibm,max-associativity-domains 00000004 00000008 00000008 00000008 00000008
This overtuning of maxdomains doesn't go unnoticed in the guest, being detected in SLUB during boot:
dmesg | grep SLUB [ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=8
SLUB is detecting 8 total nodes, with 4 nodes being online.
This patch fixes ibm,max-associativity-domains by considering the amount of NVGPUs NUMA nodes presented in the guest, instead of just spapr->gpu_numa_id.
Reported-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210128174213.1349181-4-danielhb413@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
66407069 |
| 28-Jan-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa.c: create spapr_numa_initial_nvgpu_numa_id() helper
We'll need to check the initial value given to spapr->gpu_numa_id when building the rtas DT, so put it in a helper for easier access an
spapr_numa.c: create spapr_numa_initial_nvgpu_numa_id() helper
We'll need to check the initial value given to spapr->gpu_numa_id when building the rtas DT, so put it in a helper for easier access and to avoid repetition.
Tested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210128174213.1349181-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
3b880445 |
| 28-Jan-2021 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr: move spapr_machine_using_legacy_numa() to spapr_numa.c
This function is used only in spapr_numa.c.
Tested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-o
spapr: move spapr_machine_using_legacy_numa() to spapr_numa.c
This function is used only in spapr_numa.c.
Tested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210128174213.1349181-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
Revision tags: v5.2.0, v5.2.0-rc4, v5.2.0-rc3, v5.2.0-rc2, v5.2.0-rc1, v5.2.0-rc0 |
|
#
690fbe42 |
| 07-Oct-2020 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa: consider user input when defining associativity
A new function called spapr_numa_define_associativity_domains() is created to calculate the associativity domains and change the associati
spapr_numa: consider user input when defining associativity
A new function called spapr_numa_define_associativity_domains() is created to calculate the associativity domains and change the associativity arrays considering user input. This is how the associativity domain between two NUMA nodes A and B is calculated:
- get the distance D between them
- get the correspondent NUMA level 'n_level' for D. This is done via a helper called spapr_numa_get_numa_level()
- all associativity arrays were initialized with their own numa_ids, and we're calculating the distance in node_id ascending order, starting from node id 0 (the first node retrieved by numa_state). This will have a cascade effect in the algorithm because the associativity domains that node 0 defines will be carried over to other nodes, and node 1 associativities will be carried over after taking node 0 associativities into account, and so on. This happens because we'll assign assoc_src as the associativity domain of dst as well, for all NUMA levels beyond and including n_level.
The PPC kernel expects the associativity domains of the first node (node id 0) to be always 0 [1], and this algorithm will grant that by default.
Ultimately, all of this results in a best effort approximation for the actual NUMA distances the user input in the command line. Given the nature of how PAPR itself interprets NUMA distances versus the expectations risen by how ACPI SLIT works, there might be better algorithms but, in the end, it'll also result in another way to approximate what the user really wanted.
To keep this commit message no longer than it already is, the next patch will update the existing documentation in ppc-spapr-numa.rst with more in depth details and design considerations/drawbacks.
[1] https://lore.kernel.org/linuxppc-dev/5e8fbea3-8faf-0951-172a-b41a2138fbcf@gmail.com/
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20201007172849.302240-5-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
491e884e |
| 07-Oct-2020 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa: change reference-points and maxdomain settings
This is the first guest visible change introduced in spapr_numa.c. The previous settings of both reference-points and maxdomains were too r
spapr_numa: change reference-points and maxdomain settings
This is the first guest visible change introduced in spapr_numa.c. The previous settings of both reference-points and maxdomains were too restrictive, but enough for the existing associativity we're setting in the resources.
We'll change that in the following patches, populating the associativity arrays based on user input. For those changes to be effective, reference-points and maxdomains must be more flexible. After this patch, we'll have 4 distinct levels of NUMA (0x4, 0x3, 0x2, 0x1) and maxdomains will allow for any type of configuration the user intends to do - under the scope and limitations of PAPR itself, of course.
Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20201007172849.302240-4-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
ee6635b2 |
| 07-Oct-2020 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa: forbid asymmetrical NUMA setups
The pSeries machine does not support asymmetrical NUMA configurations. This doesn't make much of a different since we're not using user input for pSeries
spapr_numa: forbid asymmetrical NUMA setups
The pSeries machine does not support asymmetrical NUMA configurations. This doesn't make much of a different since we're not using user input for pSeries NUMA setup, but this will change in the next patches.
To avoid breaking existing setups, gate this change by checking for legacy NUMA support.
Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20201007172849.302240-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
Revision tags: v5.0.1 |
|
#
876ab8d8 |
| 04-Sep-2020 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa: use spapr_numa_get_vcpu_assoc() in home_node hcall
The current implementation of h_home_node_associativity hard codes the values of associativity domains of the vcpus. Let's make it cons
spapr_numa: use spapr_numa_get_vcpu_assoc() in home_node hcall
The current implementation of h_home_node_associativity hard codes the values of associativity domains of the vcpus. Let's make it consider the values already initialized in spapr->numa_assoc_array, via the spapr_numa_get_vcpu_assoc() helper.
We want to set it and forget it, and for that we also need to assert that we don't overflow the registers of the hypercall. >From R4 to R9 we can squeeze in 12 associativity domains for vcpus, so let's assert that VCPU_ASSOC_SIZE -1 isn't greater than that.
Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200904172422.617460-4-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
d370f9cf |
| 04-Sep-2020 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa: create a vcpu associativity helper
The work to be done in h_home_node_associativity() intersects with what is already done in spapr_numa_fixup_cpu_dt(). This patch creates a new helper,
spapr_numa: create a vcpu associativity helper
The work to be done in h_home_node_associativity() intersects with what is already done in spapr_numa_fixup_cpu_dt(). This patch creates a new helper, spapr_numa_get_vcpu_assoc(), to be used for both spapr_numa_fixup_cpu_dt() and h_home_node_associativity().
While we're at it, use memcpy() instead of loop assignment to created the returned array.
Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200904172422.617460-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
f8a13fc3 |
| 04-Sep-2020 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr: move h_home_node_associativity to spapr_numa.c
The implementation of this hypercall will be modified to use spapr->numa_assoc_arrays input. Moving it to spapr_numa.c makes make more sense.
R
spapr: move h_home_node_associativity to spapr_numa.c
The implementation of this hypercall will be modified to use spapr->numa_assoc_arrays input. Moving it to spapr_numa.c makes make more sense.
Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200904172422.617460-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
#
dd7e1d7a |
| 03-Sep-2020 |
Daniel Henrique Barboza <danielhb413@gmail.com> |
spapr_numa: move NVLink2 associativity handling to spapr_numa.c
The NVLink2 GPUs works like a regular NUMA node with its own associativity values, regardless of user input.
This can be handled insi
spapr_numa: move NVLink2 associativity handling to spapr_numa.c
The NVLink2 GPUs works like a regular NUMA node with its own associativity values, regardless of user input.
This can be handled inside spapr_numa_associativity_init(), initializing NVGPU_MAX_NUM associativity arrays that can be used by the GPUs.
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200903220639.563090-5-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|