Revision tags: v6.2.1, v6.2.0, v6.3.0, v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3, v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1 |
|
#
8e5d7c42 |
| 15-Oct-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix NUMA contention due to assymetric memory
* Fix NUMA contention in situations where memory is associated with CPU cores assymetrically. In particular, with the 2990WX, half the core
kernel - Fix NUMA contention due to assymetric memory
* Fix NUMA contention in situations where memory is associated with CPU cores assymetrically. In particular, with the 2990WX, half the cores will have no memory associated with them.
* This was forcing DFly to allocate memory from queues belonging to other nearby cores, causing unnecessary SMP contention, as well as burn extra time iterating queues.
* Fix by calculating the average number of free pages per-core, and then adjust any VM page queue with pages less than the average by stealing pages from queues with greater than the average. We use a simple iterator to steal pages, so the CPUs with less (or zero) direct-attached memory will operate more UMA-like (just on 4K boundaries instead of 256-1024 byte boundaries).
* Tested with a 64-thread concurrent compile test. systat -pv 1 showed all remaining contention disappear. Literally, *ZERO* contention when we run the test with each thread in its own jail with no shared resources.
* NOTE! This fix is specific to asymetric NUMA configurations which are fairly rare in the wild and will not speed up more conventional systems.
* Before and after timings on the 2990WX.
cd /tmp/src time make -j 128 nativekernel NO_MODULES=TRUE > /dev/null
BEFORE 703.915u 167.605s 0:49.97 1744.0% 9993+749k 22188+8io 216pf+0w 699.550u 171.148s 0:50.87 1711.5% 9994+749k 21066+8io 150pf+0w
AFTER 678.406u 108.857s 0:45.66 1724.1% 10105+757k 22188+8io 216pf+0w 674.805u 115.256s 0:46.67 1692.8% 10077+755k 21066+8io 150pf+0w
This is a 4.2 second difference on the second run, an over 8% improvement which is nothing to sneeze at.
show more ...
|
Revision tags: v5.2.2, v5.2.1, v5.2.0, v5.3.0, v5.2.0rc, v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1 |
|
#
6f1d2f41 |
| 30-Apr-2017 |
Sascha Wildner <saw@online.de> |
kernel/acpi_srat: Remove some unused code.
|
Revision tags: v4.8.0, v4.6.2, v4.9.0, v4.8.0rc |
|
#
6f2099fe |
| 06-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add NUMA awareness to vm_page_alloc() and related functions (2)
* Fix miscellaneous bugs in the recent NUMA commits.
* Add kern.numa_disable, setting this to 1 in /boot/loader.conf will
kernel - Add NUMA awareness to vm_page_alloc() and related functions (2)
* Fix miscellaneous bugs in the recent NUMA commits.
* Add kern.numa_disable, setting this to 1 in /boot/loader.conf will disable the NUMA code. Note that NUMA is only applicable on multi-socket systems.
show more ...
|
#
c7f9edd8 |
| 06-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add NUMA awareness to vm_page_alloc() and related functions
* Add NUMA awareness to the kernel memory subsystem. This first iteration will primarily affect user pages. kmalloc and objca
kernel - Add NUMA awareness to vm_page_alloc() and related functions
* Add NUMA awareness to the kernel memory subsystem. This first iteration will primarily affect user pages. kmalloc and objcache are not NUMA-friendly yet (and its questionable how useful it would be to make them so).
* Tested with synth on monster (4-socket opteron / 48 cores) and a 2-socket xeon (32 threads). Appears to dole out localized pages 5:1 to 10:1.
show more ...
|