History log of /dragonfly/sys/kern/subr_cpu_topology.c (Results 1 – 23 of 23)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.2.1, v6.2.0, v6.3.0, v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1
# 53a91b8e 04-Mar-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Minor optimizations

* Minor __predict and __read_mostly/frequently optimizations.


Revision tags: v5.8.0, v5.9.0, v5.8.0rc1
# 9cd8f4f8 11-Feb-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Warn/assert on broken ACPI MADT

* Add warnings and assertions for broken ACPI MADT tables. I encountered
this trying to boot a 3990X on a motherboard with an old BIOS that didn't
suppo

kernel - Warn/assert on broken ACPI MADT

* Add warnings and assertions for broken ACPI MADT tables. I encountered
this trying to boot a 3990X on a motherboard with an old BIOS that didn't
support it. It tried to boot anyway, but the MADT table was mangled
and caused a null-pointer indirection in the kernel. Assert nicely
instead.

show more ...


Revision tags: v5.6.3
# e2164e29 18-Oct-2019 zrj <rimvydas.jasinskas@gmail.com>

<sys/slaballoc.h>: Switch to lighter <sys/_malloc.h> header.

The <sys/globaldata.h> embeds SLGlobalData that in turn embeds the
"struct malloc_type". Adjust several kernel sources for missing
in

<sys/slaballoc.h>: Switch to lighter <sys/_malloc.h> header.

The <sys/globaldata.h> embeds SLGlobalData that in turn embeds the
"struct malloc_type". Adjust several kernel sources for missing
includes where memory allocation is performed. Try to use alphabetical
include order.

Now (in most cases) <sys/malloc.h> is included after <sys/objcache.h>.
Once it gets cleaned up, the <sys/malloc.h> inclusion could be moved
out of <sys/idr.h> to drm Linux compat layer linux/slab.h without side
effects.

show more ...


# bce6845a 18-Oct-2019 zrj <rimvydas.jasinskas@gmail.com>

kernel: Minor whitespace cleanup in few sources.


Revision tags: v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1
# 8e5d7c42 15-Oct-2018 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix NUMA contention due to assymetric memory

* Fix NUMA contention in situations where memory is associated
with CPU cores assymetrically. In particular, with the 2990WX,
half the core

kernel - Fix NUMA contention due to assymetric memory

* Fix NUMA contention in situations where memory is associated
with CPU cores assymetrically. In particular, with the 2990WX,
half the cores will have no memory associated with them.

* This was forcing DFly to allocate memory from queues belonging to
other nearby cores, causing unnecessary SMP contention, as well
as burn extra time iterating queues.

* Fix by calculating the average number of free pages per-core,
and then adjust any VM page queue with pages less than the average
by stealing pages from queues with greater than the average.
We use a simple iterator to steal pages, so the CPUs with less
(or zero) direct-attached memory will operate more UMA-like
(just on 4K boundaries instead of 256-1024 byte boundaries).

* Tested with a 64-thread concurrent compile test. systat -pv 1
showed all remaining contention disappear. Literally, *ZERO*
contention when we run the test with each thread in its own jail
with no shared resources.

* NOTE! This fix is specific to asymetric NUMA configurations
which are fairly rare in the wild and will not speed up more
conventional systems.

* Before and after timings on the 2990WX.

cd /tmp/src
time make -j 128 nativekernel NO_MODULES=TRUE > /dev/null

BEFORE
703.915u 167.605s 0:49.97 1744.0% 9993+749k 22188+8io 216pf+0w
699.550u 171.148s 0:50.87 1711.5% 9994+749k 21066+8io 150pf+0w

AFTER
678.406u 108.857s 0:45.66 1724.1% 10105+757k 22188+8io 216pf+0w
674.805u 115.256s 0:46.67 1692.8% 10077+755k 21066+8io 150pf+0w

This is a 4.2 second difference on the second run, an over 8%
improvement which is nothing to sneeze at.

show more ...


# c70d4562 23-Aug-2018 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Update AMD topology detection, scheduler NUMA work (TR2)

* Update AMD topology detection to use the correct cpuid. It
now properly detects the Threadripper 2990WX as having four nodes

kernel - Update AMD topology detection, scheduler NUMA work (TR2)

* Update AMD topology detection to use the correct cpuid. It
now properly detects the Threadripper 2990WX as having four nodes
with 8 cores and 2 threads per core, per node. It previously detected
the chip as one node with 32 cores and 2 threads per core.

* Report the basic detected topology without requiring bootverbose.

* Record information about how much memory is attached to each node.
We previously just assumed that it was symmetric. This will be
used by the scheduler.

* Fix instability in the scheduler when running on a large number
of cores. Flag 0x08 (on by default) is needed to actively
schedule overloaded threads onto other cores, but this operation
was being executed on all cores simultaneously which throws the
uload/ucount metrics into an unstable state, causing threads to
bounce around longer the necessary.

Fix by round-robining the operation based on something similar to
sched_ticks % cpuid.

This significantly improves heavy multi-tasking performance on systems
with many cores.

* Add memory-on-node weighting to the scheduler. This detects asymetric
NUMA configurations for situations where not all DIMM slots have been
populated, and for CPUs which are naturally assymetric such as the
2990WX which only has memory directly connected to two of its four
nodes.

This change will preferentially schedule threads onto nodes with
greater amounts of attached memory under light loads, and dig into
the less desirable cpu nodes as the load increases.

show more ...


Revision tags: v5.2.2, v5.2.1, v5.2.0, v5.3.0, v5.2.0rc, v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc
# 3a3b0c3a 06-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - vmm_init() must run after SMP startup

* vmm_init() must run after SMP startup (fix bug introduced by recent
commits).

* cleanup.


# 6f2099fe 06-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add NUMA awareness to vm_page_alloc() and related functions (2)

* Fix miscellaneous bugs in the recent NUMA commits.

* Add kern.numa_disable, setting this to 1 in /boot/loader.conf will

kernel - Add NUMA awareness to vm_page_alloc() and related functions (2)

* Fix miscellaneous bugs in the recent NUMA commits.

* Add kern.numa_disable, setting this to 1 in /boot/loader.conf will
disable the NUMA code. Note that NUMA is only applicable on multi-socket
systems.

show more ...


# c7f9edd8 06-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add NUMA awareness to vm_page_alloc() and related functions

* Add NUMA awareness to the kernel memory subsystem. This first iteration
will primarily affect user pages. kmalloc and objca

kernel - Add NUMA awareness to vm_page_alloc() and related functions

* Add NUMA awareness to the kernel memory subsystem. This first iteration
will primarily affect user pages. kmalloc and objcache are not
NUMA-friendly yet (and its questionable how useful it would be to make
them so).

* Tested with synth on monster (4-socket opteron / 48 cores) and a 2-socket
xeon (32 threads). Appears to dole out localized pages 5:1 to 10:1.

show more ...


Revision tags: v4.6.1, v4.6.0
# 9002b0d5 30-Jul-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Refactor cpu localization for VM page allocations (2)

* Finish up the refactoring. Localize backoffs for search failures
by doing a masked domain search. This avoids bleeding into non-l

kernel - Refactor cpu localization for VM page allocations (2)

* Finish up the refactoring. Localize backoffs for search failures
by doing a masked domain search. This avoids bleeding into non-local
page queues until we've completely exhausted our local queues,
regardess of the starting pg_color index.

* We try to maintain 16-way set associativity for VM page allocations
even if the topology does not allow us to do it perfect. So, for
example, a 4-socket x 12-core (48-core) opteron can break the 256
queues into 4 x 64 queues, then split the 12-cores per socket into
sets of 3 giving 16 queues (the minimum) to each set of 3 cores.

* Refactor the page-zeroing code to only check the localized area.
This fixes a number of issues related to the zerod pages in the
queues winding up severely unbalanced. Other cpus in the local
group can help replentish a particular cpu's pre-zerod pages but
we intentionally allow a heavy user to exhaust the pages.

* Adjust the cpu topology code to normalize the physical package id.
Some machines start at 1, some machines start at 0. Normalize
everything to start at 0.

show more ...


# 33ee48c4 30-Jul-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Refactor cpu localization for VM page allocations

* Change how cpu localization works. The old scheme was extremely unbalanced
in terms of vm_page_queue[] load.

The new scheme uses cp

kernel - Refactor cpu localization for VM page allocations

* Change how cpu localization works. The old scheme was extremely unbalanced
in terms of vm_page_queue[] load.

The new scheme uses cpu topology information to break the vm_page_queue[]
down into major blocks based on the physical package id, minor blocks
based on the core id in each physical package, and then by 1's based on
(pindex + object->pg_color).

If PQ_L2_SIZE is not big enough such that 16-way operation is attainable
by physical and core id, we break the queue down only by physical id.

Note that the core id is a real core count, not a cpu thread count, so
an 8-core/16-thread x 2 socket xeon system will just fit in the 16-way
requirement (there are 256 PQ_FREE queues).

* When a particular queue does not have a free page, iterate nearby queues
start at +/- 1 (before we started at +/- PQ_L2_SIZE/2), in an attempt to
retain as much locality as possible. This won't be perfect but it should
be good enough.

* Also fix an issue with the idlezero counters.

show more ...


Revision tags: v4.6.0rc2, v4.6.0rc, v4.7.0
# d8f4ebf4 23-Apr-2016 Charlie Root <root@apollo.backplane.com>

kernel - Reduce BSS size to fix loader initrd problem

* kernel + modules + initrd.img (unpacked) exceeded the 63MB the loader has
available for load-time data.

* Top hogs are mainly in BSS. Make

kernel - Reduce BSS size to fix loader initrd problem

* kernel + modules + initrd.img (unpacked) exceeded the 63MB the loader has
available for load-time data.

* Top hogs are mainly in BSS. Make intr_info_ary and pcpu_sysctl
kmalloc after boot instead of BSS as a temporary fix.

335872 ifnet_threads
335872 netisr_cpu
339968 dummy_pcpu
344064 bsd4_pcpu
344064 stoppcbs
346112 softclock_pcpu_ary
538624 cpu_topology_nodes
755712 dfly_pcpu
786432 icu_irqmaps
958464 map_entry_init
1048576 idt_arr
1064960 pcpu_sysctl <---- now kmallocd
1179648 ioapic_irqmaps <---- (used too early, cannot be kmallocd)
5242880 intr_info_ary <---- now kmallocd

* Should fix loader issues when trying to use initrd.img[.gz] for now.

Reported-by: Valheru

show more ...


Revision tags: v4.4.3, v4.4.2, v4.4.1, v4.4.0, v4.5.0, v4.4.0rc, v4.2.4, v4.3.1, v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc
# d452e98b 05-Jun-2015 Sepherosa Ziehau <sephe@dragonflybsd.org>

cpu_topo: Add get_cpu_node_by_chipid()

This function retrieve cpu_node according to the chip ID passed.


# 52a4925c 12-May-2015 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Improve cpu topology text output

* Fix a bug the cpu range display to properly
display e.g. cpu 3 through cpu 3 as cpu 3 instead
of cpu 3-3.

* Makes sysctl hw.cpu_topology more readabl

kernel - Improve cpu topology text output

* Fix a bug the cpu range display to properly
display e.g. cpu 3 through cpu 3 as cpu 3 instead
of cpu 3-3.

* Makes sysctl hw.cpu_topology more readable.

show more ...


Revision tags: v4.0.5
# f3f3eadb 12-Mar-2015 Sascha Wildner <saw@online.de>

kernel: Move semicolon from the definition of SYSINIT() to its invocations.

This affected around 70 of our (more or less) 270 SYSINIT() calls.

style(9) advocates the terminating semicolon to be sup

kernel: Move semicolon from the definition of SYSINIT() to its invocations.

This affected around 70 of our (more or less) 270 SYSINIT() calls.

style(9) advocates the terminating semicolon to be supplied by the
invocation too, because it can make life easier for editors and other
source code parsing programs.

show more ...


Revision tags: v4.0.4, v4.0.3, v4.0.2, v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0, v3.8.2
# 399efd7f 13-Jul-2014 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Reduce console spam in verbose mode when printing cpu sets

* Add helper function kprint_cpuset().

* Print cpu ranges when printing out cpu sets.

* Print cpu ranges when generating topolog

kernel - Reduce console spam in verbose mode when printing cpu sets

* Add helper function kprint_cpuset().

* Print cpu ranges when printing out cpu sets.

* Print cpu ranges when generating topology output for sysctl

show more ...


# c07315c4 04-Jul-2014 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Refactor cpumask_t to extend cpus past 64, part 1/2

* 64-bit systems only. 32-bit builds use the macros but cannot be expanded
past 32 cpus.

* Change cpumask_t from __uint64_t to a stru

kernel - Refactor cpumask_t to extend cpus past 64, part 1/2

* 64-bit systems only. 32-bit builds use the macros but cannot be expanded
past 32 cpus.

* Change cpumask_t from __uint64_t to a structure. This commit implements
one 64-bit sub-element (the next one will implement four for 256 cpus).

* Create a CPUMASK_*() macro API for non-atomic and atomic cpumask
manipulation. These macros generally take lvalues as arguments, allowing
for a fairly optimal implementation.

* Change all C code operating on cpumask's to use the newly created CPUMASK_*()
macro API.

* Compile-test 32 and 64-bit. Run-test 64-bit.

* Adjust sbin/usched, usr.sbin/powerd. usched currently needs more work.

show more ...


Revision tags: v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc, v3.6.2
# 9fc6d334 28-Mar-2014 Sascha Wildner <saw@online.de>

kernel/cpu_topology: Fix a casting issue in the topology tree printing.

compute_unit_id is uint8_t so it's actually 0xff we want to check for.
Before this commit, the "... != -1" checks were always

kernel/cpu_topology: Fix a casting issue in the topology tree printing.

compute_unit_id is uint8_t so it's actually 0xff we want to check for.
Before this commit, the "... != -1" checks were always true.

show more ...


Revision tags: v3.6.1
# 493a3e86 13-Feb-2014 Sascha Wildner <saw@online.de>

kernel: Fix topology fallout for vkernel and i386.

* No need to fix AMD topology in vkernel.

* Expose compute_unit_id in i386 too. It stays -1 always.

Reported-by: tuxillo


# 0e9325d3 12-Feb-2014 Mihai Carabas <mihai.carabas@gmail.com>

CPU Topology: add support for Compute Units on AMD processors

Detect shared compute units between cores on AMD processors and downgrade
them to THREAD_LEVEL in the logical CPU topology used by the
s

CPU Topology: add support for Compute Units on AMD processors

Detect shared compute units between cores on AMD processors and downgrade
them to THREAD_LEVEL in the logical CPU topology used by the
scheduler.

show more ...


Revision tags: v3.6.0, v3.7.1, v3.6.0rc, v3.7.0, v3.4.3, v3.4.2, v3.4.0, v3.4.1, v3.4.0rc, v3.5.0, v3.2.2
# 1918fc5c 24-Oct-2012 Sascha Wildner <saw@online.de>

kernel: Make SMP support default (and non-optional).

The 'SMP' kernel option gets removed with this commit, so it has to
be removed from everybody's configs.

Reviewed-by: sjg
Approved-by: many


Revision tags: v3.2.1, v3.2.0, v3.3.0
# e28d8b15 18-Sep-2012 Matthew Dillon <dillon@apollo.backplane.com>

kernel - add usched_dfly algorith, set as default for now

* Fork usched_bsd4 for continued development.

* Rewrite the bsd4 scheduler to use per-cpu spinlocks and queues.

* Reformulate the cpu sele

kernel - add usched_dfly algorith, set as default for now

* Fork usched_bsd4 for continued development.

* Rewrite the bsd4 scheduler to use per-cpu spinlocks and queues.

* Reformulate the cpu selection algorithm using the topology info.
We now do a top-down iteration instead of a bottom-up iteration
to calculate the best cpu node to schedule something to.

Implements both thread push to remote queue and pull from remote queue.

* Track a load factor on a per-cpu basis.

show more ...


# f77c018a 22-Aug-2012 Mihai Carabas <mihai.carabas@gmail.com>

CPU topology support

* Part of "Add SMT/HT awareness to DragonFly BSD scheduler" GSoC
project.

* Details at: http://leaf.dragonflybsd.org/mailarchive/kernel/2012-08/msg00009.html

Mentored-by:

CPU topology support

* Part of "Add SMT/HT awareness to DragonFly BSD scheduler" GSoC
project.

* Details at: http://leaf.dragonflybsd.org/mailarchive/kernel/2012-08/msg00009.html

Mentored-by: Alex Hornung (alexh@)
Sponsored-by: Google Summer of Code 2012

show more ...