History log of /qemu/hw/ppc/spapr_numa.c (Results 1 – 25 of 29)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 44fa20c9 18-Sep-2023 Cédric Le Goater <clg@redhat.com>

spapr: Remove support for NVIDIA V100 GPU with NVLink2

NVLink2 support was removed from the PPC PowerNV platform and VFIO in
Linux 5.13 with commits :

562d1e207d32 ("powerpc/powernv: remove the n

spapr: Remove support for NVIDIA V100 GPU with NVLink2

NVLink2 support was removed from the PPC PowerNV platform and VFIO in
Linux 5.13 with commits :

562d1e207d32 ("powerpc/powernv: remove the nvlink support")
b392a1989170 ("vfio/pci: remove vfio_pci_nvlink2")

This was 2.5 years ago. Do the same in QEMU with a revert of commit
ec132efaa81f ("spapr: Support NVIDIA V100 GPU with NVLink2"). Some
adjustements are required on the NUMA part.

Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Message-ID: <20230918091717.149950-1-clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>

show more ...


# 44fa20c9 18-Sep-2023 Cédric Le Goater <clg@redhat.com>

spapr: Remove support for NVIDIA V100 GPU with NVLink2

NVLink2 support was removed from the PPC PowerNV platform and VFIO in
Linux 5.13 with commits :

562d1e207d32 ("powerpc/powernv: remove the n

spapr: Remove support for NVIDIA V100 GPU with NVLink2

NVLink2 support was removed from the PPC PowerNV platform and VFIO in
Linux 5.13 with commits :

562d1e207d32 ("powerpc/powernv: remove the nvlink support")
b392a1989170 ("vfio/pci: remove vfio_pci_nvlink2")

This was 2.5 years ago. Do the same in QEMU with a revert of commit
ec132efaa81f ("spapr: Support NVIDIA V100 GPU with NVLink2"). Some
adjustements are required on the NUMA part.

Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Message-ID: <20230918091717.149950-1-clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>

show more ...


Revision tags: v8.1.0, v8.1.0-rc4, v8.1.0-rc3, v7.2.5, v8.0.4, v8.1.0-rc2, v8.1.0-rc1, v8.1.0-rc0, v8.0.3, v7.2.4, v8.0.2, v8.0.1, v7.2.3, v7.2.2, v8.0.0, v8.0.0-rc4, v8.0.0-rc3, v7.2.1, v8.0.0-rc2, v8.0.0-rc1, v8.0.0-rc0, v7.2.0, v7.2.0-rc4, v7.2.0-rc3, v7.2.0-rc2, v7.2.0-rc1, v7.2.0-rc0, v7.1.0, v7.1.0-rc4, v7.1.0-rc3, v7.1.0-rc2, v7.1.0-rc1, v7.1.0-rc0, v7.0.0, v7.0.0-rc4, v7.0.0-rc3, v7.0.0-rc2
# 0f9668e0 23-Mar-2022 Marc-André Lureau <marcandre.lureau@redhat.com>

Remove qemu-common.h include from most units

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com>
Signed-off-by: Paolo B

Remove qemu-common.h include from most units

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


Revision tags: v7.0.0-rc1, v7.0.0-rc0
# b21e2380 15-Mar-2022 Markus Armbruster <armbru@redhat.com>

Use g_new() & friends where that makes obvious sense

g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, i

Use g_new() & friends where that makes obvious sense

g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.

This commit only touches allocations with size arguments of the form
sizeof(T).

Patch created mechanically with:

$ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \
--macro-file scripts/cocci-macro-file.h FILES...

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20220315144156.1595462-4-armbru@redhat.com>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>

show more ...


# 16282937 02-Mar-2022 Daniel Henrique Barboza <danielhb413@gmail.com>

hw/ppc/spapr_numa.c: simplify spapr_numa_write_assoc_lookup_arrays()

We can get the job done in spapr_numa_write_assoc_lookup_arrays() a bit
cleaner:

- 'cur_index = int_buf = g_malloc0(..)' is doin

hw/ppc/spapr_numa.c: simplify spapr_numa_write_assoc_lookup_arrays()

We can get the job done in spapr_numa_write_assoc_lookup_arrays() a bit
cleaner:

- 'cur_index = int_buf = g_malloc0(..)' is doing a g_malloc0() in the
'int_buf' pointer and making 'cur_index' point to 'int_buf' all in a
single line. No problem with that, but splitting into 2 lines is clearer
to follow

- use g_autofree in 'int_buf' to avoid a g_free() call later on

- 'buf_len' is only being used to store the size of 'int_buf' malloc.
Remove the var and just use the value in g_malloc0() directly

- remove the 'ret' var and just return the result of fdt_setprop()

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220228175004.8862-12-danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

show more ...


Revision tags: v6.1.1, v6.2.0, v6.2.0-rc4, v6.2.0-rc3, v6.2.0-rc2, v6.2.0-rc1
# 1fde73bc 10-Nov-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: fix FORM1 distance-less nodes

Commit 71e6fae3a99 fixed an issue with FORM2 affinity guests with NUMA
nodes in which the distance info is absent in
machine_state->numa_state->nodes. Thi

spapr_numa.c: fix FORM1 distance-less nodes

Commit 71e6fae3a99 fixed an issue with FORM2 affinity guests with NUMA
nodes in which the distance info is absent in
machine_state->numa_state->nodes. This happens when QEMU adds a default
NUMA node and when the user adds NUMA nodes without specifying the
distances.

During the discussions of the forementioned patch [1] it was found that
FORM1 guests were behaving in a strange way in the same scenario, with
the kernel seeing the distances between the nodes as '160', as we can
see in this example with 4 NUMA nodes without distance information:

$ numactl -H
available: 4 nodes (0-3)
(...)
node distances:
node 0 1 2 3
0: 10 160 160 160
1: 160 10 160 160
2: 160 160 10 160
3: 160 160 160 10

Turns out that we have the same problem with FORM1 guests - we are
calculating associativity domain using zeroed values. And as it also
turns out, the solution from 71e6fae3a99 applies to FORM1 as well.

This patch creates a wrapper called 'get_numa_distance' that contains
the logic used in FORM2 to define node distances when this information
is absent. This helper is then used in all places where we need to read
distance information from machine_state->numa_state->nodes. That way
we'll guarantee that the NUMA node distance is always being curated
before being used.

After this patch, the FORM1 guest mentioned above will have the
following topology:

$ numactl -H
available: 4 nodes (0-3)
(...)
node distances:
node 0 1 2 3
0: 10 20 20 20
1: 20 10 20 20
2: 20 20 10 20
3: 20 20 20 10

This is compatible with what FORM2 guests and other archs do in this
case.

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg01960.html

Fixes: 690fbe4295d5 ("spapr_numa: consider user input when defining associativity")
CC: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
CC: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

show more ...


Revision tags: v6.2.0-rc0
# 71e6fae3 05-Nov-2021 Nicholas Piggin <npiggin@gmail.com>

spapr_numa.c: FORM2 table handle nodes with no distance info

A configuration that specifies multiple nodes without distance info
results in the non-local points in the FORM2 matrix having a distance

spapr_numa.c: FORM2 table handle nodes with no distance info

A configuration that specifies multiple nodes without distance info
results in the non-local points in the FORM2 matrix having a distance of
0. This causes Linux to complain "Invalid distance value range" because
a node distance is smaller than the local distance.

Fix this by building a simple local / remote fallback for points where
distance information is missing.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20211105135137.1584840-1-npiggin@gmail.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


Revision tags: v6.0.1
# 28d86252 22-Sep-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: fixes in spapr_numa_FORM2_write_rtas_tables()

This patch has a handful of modifications for the recent added
FORM2 support:

- to not allocate more than the necessary size in 'distance

spapr_numa.c: fixes in spapr_numa_FORM2_write_rtas_tables()

This patch has a handful of modifications for the recent added
FORM2 support:

- to not allocate more than the necessary size in 'distance_table'.
At this moment the array is oversized due to allocating uint32_t for
all elements, when most of them fits in an uint8_t. Fix it by
changing the array to uint8_t and allocating the exact size;

- use stl_be_p() to store the uint32_t at the start of 'distance_table';

- use sizeof(uint32_t) to skip the uint32_t length when populating the
distances;

- use the NUMA_DISTANCE_MIN macro from sysemu/numa.h to avoid hardcoding
the local distance value.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210922122852.130054-2-danielhb413@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 0d5ba481 20-Sep-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: handle auto NUMA node with no distance info

numa_complete_configuration() in hw/core/numa.c always adds a NUMA node
for the pSeries machine if none was specified, but without node dist

spapr_numa.c: handle auto NUMA node with no distance info

numa_complete_configuration() in hw/core/numa.c always adds a NUMA node
for the pSeries machine if none was specified, but without node distance
information for the single node created.

NUMA FORM1 affinity code didn't rely on numa_state information to do its
job, but FORM2 does. As is now, this is the result of a pSeries guest
with NUMA FORM2 affinity when no NUMA nodes is specified:

$ numactl -H
available: 1 nodes (0)
node 0 cpus: 0
node 0 size: 16222 MB
node 0 free: 15681 MB
No distance information available.

This can be amended in spapr_numa_FORM2_write_rtas_tables(). We're
enforcing that the local distance (the distance to the node to itself) is
always 10. This allows for the proper creation of the NUMA distance tables,
fixing the output of 'numactl -H' in the guest:

$ numactl -H
available: 1 nodes (0)
node 0 cpus: 0
node 0 size: 16222 MB
node 0 free: 15685 MB
node distances:
node 0
0: 10

CC: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-8-danielhb413@gmail.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# e0eb84d4 20-Sep-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: FORM2 NUMA affinity support

The main feature of FORM2 affinity support is the separation of NUMA
distances from ibm,associativity information. This allows for a more
flexible and strai

spapr_numa.c: FORM2 NUMA affinity support

The main feature of FORM2 affinity support is the separation of NUMA
distances from ibm,associativity information. This allows for a more
flexible and straightforward NUMA distance assignment without relying on
complex associations between several levels of NUMA via
ibm,associativity matches. Another feature is its extensibility. This base
support contains the facilities for NUMA distance assignment, but in the
future more facilities will be added for latency, performance, bandwidth
and so on.

This patch implements the base FORM2 affinity support as follows:

- the use of FORM2 associativity is indicated by using bit 2 of byte 5
of ibm,architecture-vec-5. A FORM2 aware guest can choose to use FORM1
or FORM2 affinity. Setting both forms will default to FORM2. We're not
advertising FORM2 for pseries-6.1 and older machine versions to prevent
guest visible changes in those;

- ibm,associativity-reference-points has a new semantic. Instead of
being used to calculate distances via NUMA levels, it's now used to
indicate the primary domain index in the ibm,associativity domain of
each resource. In our case it's set to {0x4}, matching the position
where we already place logical_domain_id;

- two new RTAS DT artifacts are introduced: ibm,numa-lookup-index-table
and ibm,numa-distance-table. The index table is used to list all the
NUMA logical domains of the platform, in ascending order, and allows for
spartial NUMA configurations (although QEMU ATM doesn't support that).
ibm,numa-distance-table is an array that contains all the distances from
the first NUMA node to all other nodes, then the second NUMA node
distances to all other nodes and so on;

- get_max_dist_ref_points(), get_numa_assoc_size() and get_associativity()
now checks for OV5_FORM2_AFFINITY and returns FORM2 values if the guest
selected FORM2 affinity during CAS.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-7-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 5dab5abe 20-Sep-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr: move FORM1 verifications to post CAS

FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less)
NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM
device that has

spapr: move FORM1 verifications to post CAS

FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less)
NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM
device that has a different latency than the original NUMA node from the
regular memory. FORM2 is also able to deal with asymmetric NUMA
distances gracefully, something that our FORM1 implementation doesn't
do.

Move these FORM1 verifications to a new function and wait until after
CAS, when we're sure that we're sticking with FORM1, to enforce them.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-6-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# a165ac67 20-Sep-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: rename numa_assoc_array to FORM1_assoc_array

Introducing a new NUMA affinity, FORM2, requires a new mechanism to
switch between affinity modes after CAS. Also, we want FORM2 data
struc

spapr_numa.c: rename numa_assoc_array to FORM1_assoc_array

Introducing a new NUMA affinity, FORM2, requires a new mechanism to
switch between affinity modes after CAS. Also, we want FORM2 data
structures and functions to be completely separated from the existing
FORM1 code, allowing us to avoid adding new code that inherits the
existing complexity of FORM1.

The idea of switching values used by the write_dt() functions in
spapr_numa.c was already introduced in the previous patch, and
the same approach will be used when dealing with the FORM1 and FORM2
arrays.

We can accomplish that by that by renaming the existing numa_assoc_array
to FORM1_assoc_array, which now is used exclusively to handle FORM1 affinity
data. A new helper get_associativity() is then introduced to be used by the
write_dt() functions to retrieve the current ibm,associativity array of
a given node, after considering affinity selection that might have been
done during CAS. All code that was using numa_assoc_array now needs to
retrieve the array by calling this function.

This will allow for an easier plug of FORM2 data later on.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-5-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 3a6e4ce6 20-Sep-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: parametrize FORM1 macros

The next preliminary step to introduce NUMA FORM2 affinity is to make
the existing code independent of FORM1 macros and values, i.e.
MAX_DISTANCE_REF_POINTS, N

spapr_numa.c: parametrize FORM1 macros

The next preliminary step to introduce NUMA FORM2 affinity is to make
the existing code independent of FORM1 macros and values, i.e.
MAX_DISTANCE_REF_POINTS, NUMA_ASSOC_SIZE and VCPU_ASSOC_SIZE. This patch
accomplishes that by doing the following:

- move the NUMA related macros from spapr.h to spapr_numa.c where they
are used. spapr.h gets instead a 'NUMA_NODES_MAX_NUM' macro that is used
to refer to the maximum number of NUMA nodes, including GPU nodes, that
the machine can support;

- MAX_DISTANCE_REF_POINTS and NUMA_ASSOC_SIZE are renamed to
FORM1_DIST_REF_POINTS and FORM1_NUMA_ASSOC_SIZE. These FORM1 specific
macros are used in FORM1 init functions;

- code that uses MAX_DISTANCE_REF_POINTS now retrieves the
max_dist_ref_points value using get_max_dist_ref_points().
NUMA_ASSOC_SIZE is replaced by get_numa_assoc_size() and VCPU_ASSOC_SIZE
is replaced by get_vcpu_assoc_size(). These functions are used by the
generic device tree functions and h_home_node_associativity() and will
allow them to switch between FORM1 and FORM2 without changing their core
logic.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-4-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# afa3b3c9 20-Sep-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: scrap 'legacy_numa' concept

When first introduced, 'legacy_numa' was a way to refer to guests that
either wouldn't be affected by associativity domain calculations, namely
the ones wit

spapr_numa.c: scrap 'legacy_numa' concept

When first introduced, 'legacy_numa' was a way to refer to guests that
either wouldn't be affected by associativity domain calculations, namely
the ones with only 1 NUMA node, and pre 5.2 guests that shouldn't be
affected by it because it would be an userspace change. Calling these
cases 'legacy_numa' was a convenient way to label these cases.

We're about to introduce a new NUMA affinity, FORM2, and this concept
of 'legacy_numa' is now a bit misleading because, although it is called
'legacy' it is in fact a FORM1 exclusive contraint.

This patch removes spapr_machine_using_legacy_numa() and open code the
conditions in each caller. While we're at it, move the chunk inside
spapr_numa_FORM1_affinity_init() that sets all numa_assoc_array domains
with 'node_id' to spapr_numa_define_FORM1_domains(). This chunk was
being executed if !pre_5_2_numa_associativity and num_nodes => 1, the
same conditions in which spapr_numa_define_FORM1_domains() is called
shortly after.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# d98dbe2a 20-Sep-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: split FORM1 code into helpers

The upcoming FORM2 NUMA affinity will support asymmetric NUMA topologies
and doesn't need be concerned with all the legacy support for older
pseries FORM1

spapr_numa.c: split FORM1 code into helpers

The upcoming FORM2 NUMA affinity will support asymmetric NUMA topologies
and doesn't need be concerned with all the legacy support for older
pseries FORM1 guests.

We're also not going to calculate associativity domains based on numa
distance (via spapr_numa_define_associativity_domains) since the
distances will be written directly into new DT properties.

Let's split FORM1 code into its own functions to allow for easier
insertion of FORM2 logic later on.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210920174947.556324-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


Revision tags: v6.1.0, v6.1.0-rc4, v6.1.0-rc3, v6.1.0-rc2, v6.1.0-rc1, v6.1.0-rc0, v6.0.0, v6.0.0-rc5, v6.0.0-rc4, v6.0.0-rc3, v6.0.0-rc2, v6.0.0-rc1, v6.0.0-rc0
# b01fec36 28-Jan-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: fix ibm,max-associativity-domains calculation

The current logic for calculating 'maxdomain' making it a sum of
numa_state->num_nodes with spapr->gpu_numa_id. spapr->gpu_numa_id is
used

spapr_numa.c: fix ibm,max-associativity-domains calculation

The current logic for calculating 'maxdomain' making it a sum of
numa_state->num_nodes with spapr->gpu_numa_id. spapr->gpu_numa_id is
used as a index to determine the next available NUMA id that a
given NVGPU can use.

The problem is that the initial value of gpu_numa_id, for any topology
that has more than one NUMA node, is equal to numa_state->num_nodes.
This means that our maxdomain will always be, at least, twice the
amount of existing NUMA nodes. This means that a guest with 4 NUMA
nodes will end up with the following max-associativity-domains:

rtas/ibm,max-associativity-domains
00000004 00000008 00000008 00000008 00000008

This overtuning of maxdomains doesn't go unnoticed in the guest, being
detected in SLUB during boot:

dmesg | grep SLUB
[ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=8

SLUB is detecting 8 total nodes, with 4 nodes being online.

This patch fixes ibm,max-associativity-domains by considering the amount
of NVGPUs NUMA nodes presented in the guest, instead of just
spapr->gpu_numa_id.

Reported-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210128174213.1349181-4-danielhb413@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 66407069 28-Jan-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa.c: create spapr_numa_initial_nvgpu_numa_id() helper

We'll need to check the initial value given to spapr->gpu_numa_id when
building the rtas DT, so put it in a helper for easier access an

spapr_numa.c: create spapr_numa_initial_nvgpu_numa_id() helper

We'll need to check the initial value given to spapr->gpu_numa_id when
building the rtas DT, so put it in a helper for easier access and to
avoid repetition.

Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210128174213.1349181-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 3b880445 28-Jan-2021 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr: move spapr_machine_using_legacy_numa() to spapr_numa.c

This function is used only in spapr_numa.c.

Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-o

spapr: move spapr_machine_using_legacy_numa() to spapr_numa.c

This function is used only in spapr_numa.c.

Tested-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20210128174213.1349181-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


Revision tags: v5.2.0, v5.2.0-rc4, v5.2.0-rc3, v5.2.0-rc2, v5.2.0-rc1, v5.2.0-rc0
# 690fbe42 07-Oct-2020 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa: consider user input when defining associativity

A new function called spapr_numa_define_associativity_domains()
is created to calculate the associativity domains and change
the associati

spapr_numa: consider user input when defining associativity

A new function called spapr_numa_define_associativity_domains()
is created to calculate the associativity domains and change
the associativity arrays considering user input. This is how
the associativity domain between two NUMA nodes A and B is
calculated:

- get the distance D between them

- get the correspondent NUMA level 'n_level' for D. This is done
via a helper called spapr_numa_get_numa_level()

- all associativity arrays were initialized with their own
numa_ids, and we're calculating the distance in node_id ascending
order, starting from node id 0 (the first node retrieved by
numa_state). This will have a cascade effect in the algorithm because
the associativity domains that node 0 defines will be carried over to
other nodes, and node 1 associativities will be carried over after
taking node 0 associativities into account, and so on. This
happens because we'll assign assoc_src as the associativity domain
of dst as well, for all NUMA levels beyond and including n_level.

The PPC kernel expects the associativity domains of the first node
(node id 0) to be always 0 [1], and this algorithm will grant that
by default.

Ultimately, all of this results in a best effort approximation for
the actual NUMA distances the user input in the command line. Given
the nature of how PAPR itself interprets NUMA distances versus the
expectations risen by how ACPI SLIT works, there might be better
algorithms but, in the end, it'll also result in another way to
approximate what the user really wanted.

To keep this commit message no longer than it already is, the next
patch will update the existing documentation in ppc-spapr-numa.rst
with more in depth details and design considerations/drawbacks.

[1] https://lore.kernel.org/linuxppc-dev/5e8fbea3-8faf-0951-172a-b41a2138fbcf@gmail.com/

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20201007172849.302240-5-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# 491e884e 07-Oct-2020 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa: change reference-points and maxdomain settings

This is the first guest visible change introduced in
spapr_numa.c. The previous settings of both reference-points
and maxdomains were too r

spapr_numa: change reference-points and maxdomain settings

This is the first guest visible change introduced in
spapr_numa.c. The previous settings of both reference-points
and maxdomains were too restrictive, but enough for the
existing associativity we're setting in the resources.

We'll change that in the following patches, populating the
associativity arrays based on user input. For those changes
to be effective, reference-points and maxdomains must be
more flexible. After this patch, we'll have 4 distinct
levels of NUMA (0x4, 0x3, 0x2, 0x1) and maxdomains will
allow for any type of configuration the user intends to
do - under the scope and limitations of PAPR itself, of
course.

Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20201007172849.302240-4-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# ee6635b2 07-Oct-2020 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa: forbid asymmetrical NUMA setups

The pSeries machine does not support asymmetrical NUMA
configurations. This doesn't make much of a different
since we're not using user input for pSeries

spapr_numa: forbid asymmetrical NUMA setups

The pSeries machine does not support asymmetrical NUMA
configurations. This doesn't make much of a different
since we're not using user input for pSeries NUMA setup,
but this will change in the next patches.

To avoid breaking existing setups, gate this change by
checking for legacy NUMA support.

Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20201007172849.302240-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


Revision tags: v5.0.1
# 876ab8d8 04-Sep-2020 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa: use spapr_numa_get_vcpu_assoc() in home_node hcall

The current implementation of h_home_node_associativity hard codes
the values of associativity domains of the vcpus. Let's make
it cons

spapr_numa: use spapr_numa_get_vcpu_assoc() in home_node hcall

The current implementation of h_home_node_associativity hard codes
the values of associativity domains of the vcpus. Let's make
it consider the values already initialized in spapr->numa_assoc_array,
via the spapr_numa_get_vcpu_assoc() helper.

We want to set it and forget it, and for that we also need to
assert that we don't overflow the registers of the hypercall.
>From R4 to R9 we can squeeze in 12 associativity domains for
vcpus, so let's assert that VCPU_ASSOC_SIZE -1 isn't greater
than that.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200904172422.617460-4-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# d370f9cf 04-Sep-2020 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa: create a vcpu associativity helper

The work to be done in h_home_node_associativity() intersects
with what is already done in spapr_numa_fixup_cpu_dt(). This
patch creates a new helper,

spapr_numa: create a vcpu associativity helper

The work to be done in h_home_node_associativity() intersects
with what is already done in spapr_numa_fixup_cpu_dt(). This
patch creates a new helper, spapr_numa_get_vcpu_assoc(), to
be used for both spapr_numa_fixup_cpu_dt() and
h_home_node_associativity().

While we're at it, use memcpy() instead of loop assignment
to created the returned array.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200904172422.617460-3-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# f8a13fc3 04-Sep-2020 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr: move h_home_node_associativity to spapr_numa.c

The implementation of this hypercall will be modified to use
spapr->numa_assoc_arrays input. Moving it to spapr_numa.c makes
make more sense.

R

spapr: move h_home_node_associativity to spapr_numa.c

The implementation of this hypercall will be modified to use
spapr->numa_assoc_arrays input. Moving it to spapr_numa.c makes
make more sense.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200904172422.617460-2-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


# dd7e1d7a 03-Sep-2020 Daniel Henrique Barboza <danielhb413@gmail.com>

spapr_numa: move NVLink2 associativity handling to spapr_numa.c

The NVLink2 GPUs works like a regular NUMA node with its
own associativity values, regardless of user input.

This can be handled insi

spapr_numa: move NVLink2 associativity handling to spapr_numa.c

The NVLink2 GPUs works like a regular NUMA node with its
own associativity values, regardless of user input.

This can be handled inside spapr_numa_associativity_init(),
initializing NVGPU_MAX_NUM associativity arrays that can
be used by the GPUs.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20200903220639.563090-5-danielhb413@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

show more ...


12