History log of /dragonfly/sys/platform/pc64/acpica/acpi_srat.c (Results 1 – 4 of 4)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.2.1, v6.2.0, v6.3.0, v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3, v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1
# 8e5d7c42 15-Oct-2018 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix NUMA contention due to assymetric memory

* Fix NUMA contention in situations where memory is associated
with CPU cores assymetrically. In particular, with the 2990WX,
half the core

kernel - Fix NUMA contention due to assymetric memory

* Fix NUMA contention in situations where memory is associated
with CPU cores assymetrically. In particular, with the 2990WX,
half the cores will have no memory associated with them.

* This was forcing DFly to allocate memory from queues belonging to
other nearby cores, causing unnecessary SMP contention, as well
as burn extra time iterating queues.

* Fix by calculating the average number of free pages per-core,
and then adjust any VM page queue with pages less than the average
by stealing pages from queues with greater than the average.
We use a simple iterator to steal pages, so the CPUs with less
(or zero) direct-attached memory will operate more UMA-like
(just on 4K boundaries instead of 256-1024 byte boundaries).

* Tested with a 64-thread concurrent compile test. systat -pv 1
showed all remaining contention disappear. Literally, *ZERO*
contention when we run the test with each thread in its own jail
with no shared resources.

* NOTE! This fix is specific to asymetric NUMA configurations
which are fairly rare in the wild and will not speed up more
conventional systems.

* Before and after timings on the 2990WX.

cd /tmp/src
time make -j 128 nativekernel NO_MODULES=TRUE > /dev/null

BEFORE
703.915u 167.605s 0:49.97 1744.0% 9993+749k 22188+8io 216pf+0w
699.550u 171.148s 0:50.87 1711.5% 9994+749k 21066+8io 150pf+0w

AFTER
678.406u 108.857s 0:45.66 1724.1% 10105+757k 22188+8io 216pf+0w
674.805u 115.256s 0:46.67 1692.8% 10077+755k 21066+8io 150pf+0w

This is a 4.2 second difference on the second run, an over 8%
improvement which is nothing to sneeze at.

show more ...


Revision tags: v5.2.2, v5.2.1, v5.2.0, v5.3.0, v5.2.0rc, v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1
# 6f1d2f41 30-Apr-2017 Sascha Wildner <saw@online.de>

kernel/acpi_srat: Remove some unused code.


Revision tags: v4.8.0, v4.6.2, v4.9.0, v4.8.0rc
# 6f2099fe 06-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add NUMA awareness to vm_page_alloc() and related functions (2)

* Fix miscellaneous bugs in the recent NUMA commits.

* Add kern.numa_disable, setting this to 1 in /boot/loader.conf will

kernel - Add NUMA awareness to vm_page_alloc() and related functions (2)

* Fix miscellaneous bugs in the recent NUMA commits.

* Add kern.numa_disable, setting this to 1 in /boot/loader.conf will
disable the NUMA code. Note that NUMA is only applicable on multi-socket
systems.

show more ...


# c7f9edd8 06-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add NUMA awareness to vm_page_alloc() and related functions

* Add NUMA awareness to the kernel memory subsystem. This first iteration
will primarily affect user pages. kmalloc and objca

kernel - Add NUMA awareness to vm_page_alloc() and related functions

* Add NUMA awareness to the kernel memory subsystem. This first iteration
will primarily affect user pages. kmalloc and objcache are not
NUMA-friendly yet (and its questionable how useful it would be to make
them so).

* Tested with synth on monster (4-socket opteron / 48 cores) and a 2-socket
xeon (32 threads). Appears to dole out localized pages 5:1 to 10:1.

show more ...