Revision tags: v6.2.1, v6.2.0, v6.3.0, v6.0.1 |
|
#
cac12823 |
| 24-Jun-2021 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add sequential TSC test, refactor concurrency test
* Add a test that test the TSC, checking whether rdtsc() returns monotonically increasing values when hopping between CPUs. We test f
kernel - Add sequential TSC test, refactor concurrency test
* Add a test that test the TSC, checking whether rdtsc() returns monotonically increasing values when hopping between CPUs. We test four times from cpus 0 to (ncpus-1), and from (ncpus-1) to 0.
* Revamp the TSC concurrency test. Make the test a bit more robust to better adapt to HVMs.
* Report all results in nanoseconds. Example output from a threadripper:
TSC cpu-delta test complete, 1472nS to 11281nS SUCCESS TSC cpu-delta test complete, 1312nS to 3386nS SUCCESS TSC cpu-delta test complete, 1312nS to 3526nS SUCCESS TSC cpu-delta test complete, 1312nS to 3396nS SUCCESS TSC cpu concurrency test complete, worst=320ns, avg=60ns SUCCESS
show more ...
|
Revision tags: v6.0.0, v6.0.0rc1, v6.1.0 |
|
#
db2ec6f8 |
| 25-Oct-2020 |
Sascha Wildner <saw@online.de> |
kernel: Staticize some variables in platform/pc64.
Also, remove some unused variables and move some extern declarations to header files.
|
Revision tags: v5.8.3, v5.8.2 |
|
#
feadd4ae |
| 11-Jun-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor sysclock_t from 32 to 64 bits (2)
* Cputimer reload values can be negative, check condition and set a small positive reload value instead.
* Also avoids muldivu64() overflow war
kernel - Refactor sysclock_t from 32 to 64 bits (2)
* Cputimer reload values can be negative, check condition and set a small positive reload value instead.
* Also avoids muldivu64() overflow warnings on the console.
Reported-by: kworr
show more ...
|
#
8fbc264d |
| 09-Jun-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor sysclock_t from 32 to 64 bits
* Refactor the core cpu timer API, changing sysclock_t from 32 to 64 bits. Provide a full 64-bit count from all sources.
* Implement muldivu64() u
kernel - Refactor sysclock_t from 32 to 64 bits
* Refactor the core cpu timer API, changing sysclock_t from 32 to 64 bits. Provide a full 64-bit count from all sources.
* Implement muldivu64() using gcc's 128-bit integer type. This functions takes three 64-bit valus, performs (a * b) / d using a 128-bit intermediate calculation, and returns a 64-bit result.
Change all timer scaling functions to use this function which effectively gives systimers the capability of handling any timeout that fits 64 bits for the timer's resolution.
* Remove TSC frequency scaling, it is no longer needed. The TSC timer is now used at its full resolution.
* Use atomic_fcmpset_long() instead of a clock spinlock when updating the msb bits for hardware timer sources less than 64 bits wide.
* Properly recalculate existing systimers when the clock source is changed. Existing systimers were not being recalculated, leading to the system failing to boot when time sources had radically different clock frequencies.
show more ...
|
Revision tags: v5.8.1 |
|
#
63823918 |
| 11-Mar-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Allow 8254 timer to be forced, clean-up user/sys/intr/idle
* Allows the 8254 timer to be forced on for machines which do not support the LAPIC timer during deep-sleep. Fix an assertion t
kernel - Allow 8254 timer to be forced, clean-up user/sys/intr/idle
* Allows the 8254 timer to be forced on for machines which do not support the LAPIC timer during deep-sleep. Fix an assertion that occurs in this situation.
hw.i8254.intr_disable="0"
* Adjust the statclock to calculate user/sys/intr/idle time properly when the clock interrupt occurs from an interrupt thread instead of from a hard interrupt.
Basically when the clock interrupt occurs from an interrupt thread, we have to look at curthread->td_preempted instead of curthread.
In addition RQF_INTPEND will be set across the call due to the way processing works and we have to look at the bitmask of interrupt sources instead of this bit.
Reported-by: CuteLarva
show more ...
|
Revision tags: v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3, v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2 |
|
#
15729594 |
| 28-Dec-2018 |
Imre Vadász <imre@vdsz.com> |
vkernel - Delete unused/ancient timer/rtc function declarations in clock.h.
* Also get rid of tsc_is_broken flag, that is completely unused.
|
#
7b21e5e4 |
| 25-Dec-2018 |
Imre Vadász <imre@vdsz.com> |
kernel - Factor out TSC cputimer into common x86_64 code, use for vkernel.
* This adds a command line flag -T to the vkernel, to force disable use of the TSC cputimer.
* By default the TSC will b
kernel - Factor out TSC cputimer into common x86_64 code, use for vkernel.
* This adds a command line flag -T to the vkernel, to force disable use of the TSC cputimer.
* By default the TSC will be used as a cputimer for the vkernel when the TSC is invariant and mpsync according to the hw.tsc_invariant and hw.tsc_mpsync sysctl values of the host.
show more ...
|
Revision tags: v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1 |
|
#
1eb5a42b |
| 17-Aug-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor the TSC MP synchronization test
* Refactor the TSC MP synchronization test. Do not use cpusync. Using cpusync results in O(N x N) worth of overhead instead of O(N) worth of ov
kernel - Refactor the TSC MP synchronization test
* Refactor the TSC MP synchronization test. Do not use cpusync. Using cpusync results in O(N x N) worth of overhead instead of O(N) worth of overhead.
Instead, have the per-cpu threads run the test simultaneously using each other's data.
* We synchronize to the last TSC element that was saved on each cpu. This probably needs a bit of work to ensure determinism, but at the moment its good in that it synchronizes all cores off of a single cache mastership change, instead of having them all compete for cache mastership.
* Probably needs some fine tuning, at the moment I allow a slop of 10uS which is almost certainly too much. Note, however, that SMP interactions can create ~1uS latencies on particular memory accesses.
* Solves serious issues with the old test on 64 cpu threads. These issues may also have been related to the ipiq fifo size being too small.
show more ...
|
Revision tags: v5.2.2, v5.2.1, v5.2.0, v5.3.0, v5.2.0rc |
|
#
4098a6e5 |
| 25-Feb-2018 |
Imre Vadász <imre@vdsz.com> |
pc64 - Improve TSC and LAPIC timer calibration code.
* The hw.tsc_calibrate_test=1 and hw.lapic_calibrate_test=1 tunables can be specified to test results of the calibration for different delays
pc64 - Improve TSC and LAPIC timer calibration code.
* The hw.tsc_calibrate_test=1 and hw.lapic_calibrate_test=1 tunables can be specified to test results of the calibration for different delays (from 100 milliseconds to 2 seconds in 100 millisecond steps).
* With this change the TSC and LAPIC calibration each should take only 200 milliseconds, instead of the original 1 second and 2 second delays.
* This change tries to make the TSC calibration more exact, by averaging the TSC values from before and after reading the timer. By sampling the latency of reading the (HPET) timer, we can make sure that the start and end measurements of TSC and the (typically HPET or i8254) timer didn't get interrupted (e.g. by an SMI on hardware, or by the host when running virtualized), and filter out those outliers.
* Additionally for the TSC calibration the new code does 2 measurements at the start and end of the delay time, separated by 20 milliseconds. This should make results even more consistent.
* The hw.calibrate_tsc_fast=0 tunable can be set, to revert to the old TSC calibration code.
* Use the TSC to calibrate the LAPIC timer, when the TSC is invariant. Although this indirect calibration might accumulate inaccuracies, this still seems better. Since the TSC runs very fast, we can get a very accurate value in 200ms or even less. To forcibly disable the TSC based LAPIC calibration, set the hw.lapic_calibrate_fast=0 loader tunable.
* The fallback (without using the TSC) LAPIC calibration is slightly improved, by measuring the sysclock timestamp at the start and end of the measurement explicitly with sys_cputimer->count(). Also the lapic timer is explicitly read after starting the countdown. It also proves to be useful in at least some virtualization environments (e.g. QEMU with TCG emulation), to do some LAPIC timer access before actually measuring anything.
* The HPET and LAPIC mmio read accesses are no barrier for Intel and AMD cpus. So we explicitly have to avoid out-of-order execution of the rdtsc() call that follows the sys_cputimer->count(), by using rdtsc_ordered() which uses lfence or mfence on Intel and AMD CPUs respectively.
show more ...
|
#
1a3a6cee |
| 18-Feb-2018 |
Imre Vadász <imre@vdsz.com> |
pc64 - Allow for initializing other cputimers than i8254 in early boot.
|
#
8a93c79f |
| 04-Mar-2018 |
Imre Vadász <imre@vdsz.com> |
pc64 - Unmask some AMD Family 15h and 16h CPUs for TSC mpsync test.
* The problematic Erratum 778 "Processor Core Time Stamp Counters May Experience Drift" is only listed for Family 15h < Model
pc64 - Unmask some AMD Family 15h and 16h CPUs for TSC mpsync test.
* The problematic Erratum 778 "Processor Core Time Stamp Counters May Experience Drift" is only listed for Family 15h < Model 30h and for Family 16h < Model 30h (Family 15h == Bulldozer, and Family 16h == Jaguar).
show more ...
|
Revision tags: v5.0.2, v5.0.1 |
|
#
3a80fe2b |
| 28-Oct-2017 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
x86_64: Add pauses in the TSC mpsync testing loop.
This fixes Intel N3450 deadlock in the tight rdtsc/IPI loop.
Suggested-by: dillon@ Tested-by: mneumann@ Dragonfly-bug: http://bugs.dragonflybsd.or
x86_64: Add pauses in the TSC mpsync testing loop.
This fixes Intel N3450 deadlock in the tight rdtsc/IPI loop.
Suggested-by: dillon@ Tested-by: mneumann@ Dragonfly-bug: http://bugs.dragonflybsd.org/issues/3087
show more ...
|
#
33bb59d9 |
| 24-Oct-2017 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
x86_64: Allow TSC MP synchronization test be disabled.
|
Revision tags: v5.0.0 |
|
#
5b49787b |
| 05-Oct-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor smp collision statistics
* Add an indefinite wait timing API (sys/indefinite.h, sys/indefinite2.h). This interface uses the TSC and will record lock latencies to our pcpu stat
kernel - Refactor smp collision statistics
* Add an indefinite wait timing API (sys/indefinite.h, sys/indefinite2.h). This interface uses the TSC and will record lock latencies to our pcpu stats in microseconds. The systat -pv 1 display shows this under smpcoll.
Note that latencies generated by tokens, lockmgr, and mutex locks do not necessarily reflect actual lost cpu time as the kernel will schedule other threads while those are blocked, if other threads are available.
* Formalize TSC operations more, supply a type (tsc_uclock_t and tsc_sclock_t).
* Reinstrument lockmgr, mutex, token, and spinlocks to use the new indefinite timing interface.
show more ...
|
Revision tags: v5.0.0rc2, v5.1.0, v5.0.0rc1 |
|
#
242fd95f |
| 31-Aug-2017 |
Imre Vadász <imre@vdsz.com> |
pc64: Turn CLK_USE_* flags into tunable, skip code when results are unused.
* This should save ca. 1s during bootup, by skipping the i8254 and TSC calibration based on the RTC, when we aren't actu
pc64: Turn CLK_USE_* flags into tunable, skip code when results are unused.
* This should save ca. 1s during bootup, by skipping the i8254 and TSC calibration based on the RTC, when we aren't actually using the result in any way.
* For now, continue running that calibration code, when booting in verbose mode, to keep printing these potentially useful calibration results.
* Setting the hw.calibrate_timers_with_rtc tunable to 1, will enable this calibration code, and use those results for the i8254 and TSC frequencies.
show more ...
|
Revision tags: v4.8.1 |
|
#
79c04d9c |
| 27-Mar-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Use the TSC as the cpu clock on AMD Ryzen or later
* The TSC is usable as the cpu clock on AMD Ryzen or later, adjust the code for this.
* Recode hw.tsc_cputimer_force to still run the T
kernel - Use the TSC as the cpu clock on AMD Ryzen or later
* The TSC is usable as the cpu clock on AMD Ryzen or later, adjust the code for this.
* Recode hw.tsc_cputimer_force to still run the TSC test, but then force use of the TSC for the cpu clock whether the test succeeds or fails.
Suggested-by: Sephe
show more ...
|
Revision tags: v4.8.0, v4.6.2, v4.9.0, v4.8.0rc |
|
#
6b91ee43 |
| 14-Feb-2017 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
kern: Add cpucounter which returns 64bit monotonic counter.
It will be used to: - Simplify per-cpu raw time extraction. - ALTQ machclk. - Per packet timestamp for CoDel.
As of this commit, dummy cp
kern: Add cpucounter which returns 64bit monotonic counter.
It will be used to: - Simplify per-cpu raw time extraction. - ALTQ machclk. - Per packet timestamp for CoDel.
As of this commit, dummy cpucounter, which falls back to cputimer, and TSC cpucounter are implemented.
show more ...
|
#
0087561d |
| 24-Jan-2017 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
cputimer: Initialize explicitly.
|
Revision tags: v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0 |
|
#
42098fc3 |
| 14-Jun-2016 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
cputimer: Add per-cpu handler and private data for interrupt cputimer.
|
#
632f4575 |
| 12-Jun-2016 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
tsc: Log the final TSC frequency
|
Revision tags: v4.4.3, v4.4.2 |
|
#
2c64e990 |
| 25-Jan-2016 |
zrj <rimvydas.jasinskas@gmail.com> |
Remove advertising header from sys/
Correct BSD License clause numbering from 1-2-4 to 1-2-3.
Some less clear cases taken as it was done of FreeBSD.
|
Revision tags: v4.4.1, v4.4.0, v4.5.0, v4.4.0rc, v4.2.4, v4.3.1 |
|
#
ce7866b8 |
| 14-Jul-2015 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix live lock in vfs_conf.c mountroot>
* The mountroot> prompt calls cngetc() to process user input. However, this function hard loops and can prevent other kernel threads from running o
kernel - Fix live lock in vfs_conf.c mountroot>
* The mountroot> prompt calls cngetc() to process user input. However, this function hard loops and can prevent other kernel threads from running on the current cpu.
* Rearrange the code to use cncheckc() and a 1/25 second tsleep().
* Fix a bug in the syscons code where NOKEY was not being properly returned as documented. Modify all use cases to handle NOKEY. This allows us to differentiate between a keyboard present but not key pressed and a keyboard not present.
* Pull the automatic polling mode code out of cncheckc() (or more precisely, out of sccncheckc()) and add a new cnpoll() API function to set it manually.
This fixes issues in vfs_conf when normal keyboard processing interrupts are operational and cncheckc() is used with a tsleep() delay. The normal processing interrupt wound up eating the keystrokes so the cncheckc() basically always failed.
cncheckc() in general also always had a small window of opportunity where a keystroke could be lost due loops on it.
* Call cnpoll() in various places, such as when entering the debugger, asking for input in vfs_conf, and a few other places.
show more ...
|
Revision tags: v4.2.3, v4.2.1 |
|
#
e28c8ef4 |
| 27-Jun-2015 |
Sascha Wildner <saw@online.de> |
kernel: Use 'normal' types (i.e., uint8_t instead of __uint8_t).
|
Revision tags: v4.2.0, v4.0.6, v4.3.0, v4.2.0rc |
|
#
59878316 |
| 01-Jun-2015 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
cputimer/tsc: Prevent rdtsc reordering
Use lfence on Intel and mfence on AMD to make sure that all instructions before rdtsc are completed. This should prevent time warps, if TSC is selected as cpu
cputimer/tsc: Prevent rdtsc reordering
Use lfence on Intel and mfence on AMD to make sure that all instructions before rdtsc are completed. This should prevent time warps, if TSC is selected as cputimer.
show more ...
|
#
ea9728ca |
| 01-Jun-2015 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
tsc: Factor out rdtsc_ordered()
Use lfence on Intel and mfence on AMD to make sure that all instructions before rdtsc are completed.
While I'm here - Remove redundant functions declaration in lwkt_
tsc: Factor out rdtsc_ordered()
Use lfence on Intel and mfence on AMD to make sure that all instructions before rdtsc are completed.
While I'm here - Remove redundant functions declaration in lwkt_thread.c to unbreak compile. - Add cpu_vendor_id for vkernel64; extra work is needed to set it to a proper value.
show more ...
|