kern_tc.c - OpenGrok history log for /openbsd/sys/kern/kern

Revision	Date	Author	Comments
# 127fa8d5	23-Feb-2024	cheloha <cheloha@openbsd.org>	timecounting: start system uptime at 0.0 instead of 1.0 OpenBSD starts the system uptime clock at 1.0 instead of 0.0. We inherited this behavior from FreeBSD when we imported kern_tc.c. patrick@ r timecounting: start system uptime at 0.0 instead of 1.0 OpenBSD starts the system uptime clock at 1.0 instead of 0.0. We inherited this behavior from FreeBSD when we imported kern_tc.c. patrick@ reports that this causes a problem in sdmmc(4) during boot: the sdmmc_delay() call in sdmmc_init() doesn't block for the full 250ms. This happens because the system hardclock() starts at 0.0 and executes about hz times, rapidly, to "catch up" to 1.0. This instantly expires the first hz timeout ticks, hence the short sleep. Starting the system uptime at 0.0 fixes the problem. Prompted by patrick@. Tested by patrick@. In snaps since Feb 19 2023. Thread: https://marc.info/?l=openbsd-tech&m=170830229732396&w=2 ok patrick@ deraadt@ show more ...
# 24ee467d	04-Feb-2023	cheloha <cheloha@openbsd.org>	timecounting: remove incomplete PPS support The timecounting code has had stubs for pulse-per-second (PPS) polling since it was imported in 2004. At this point it seems unlikely that anyone is goin timecounting: remove incomplete PPS support The timecounting code has had stubs for pulse-per-second (PPS) polling since it was imported in 2004. At this point it seems unlikely that anyone is going to finish adding PPS support, so let's remove the stubs: - Delete the dead tc_poll_pps() call from tc_windup(). - Remove all tc_poll_pps symbols from the kernel. Link: https://marc.info/?l=openbsd-tech&m=167519035723210&w=2 ok miod@ show more ...
# 6fba7c69	13-Dec-2022	cheloha <cheloha@openbsd.org>	timecounting: add getbinruntime(), getnsecruntime() The networking people want a fast, monotonic clock that only advances while the system is not suspended. The runtime clock satisfies most of thes timecounting: add getbinruntime(), getnsecruntime() The networking people want a fast, monotonic clock that only advances while the system is not suspended. The runtime clock satisfies most of these requirements, so introduce getnsecruntime() to provide a fast means for reading it. Based on patches from jca@ and claudio@. ok yasuoka@ show more ...
# 2b46a8cb	05-Dec-2022	deraadt <deraadt@openbsd.org>	zap a pile of dangling tabs
# a6064e19	08-Nov-2022	cheloha <cheloha@openbsd.org>	tc_setclock: don't print a warning if tc_windup() rejects inittodr(9) time During resume, it isn't necessarily a problem if the UTC time we get from inittodr(9) lags behind the system UTC clock. In tc_setclock: don't print a warning if tc_windup() rejects inittodr(9) time During resume, it isn't necessarily a problem if the UTC time we get from inittodr(9) lags behind the system UTC clock. In particular, if the active timecounter's frequency is low enough, tc_delta() might not overflow across a brief suspend. Remove the misleading warning message. The code is behaving as intended, just not in a way I anticipated when I added the warning message a few years ago. Discovered by kettenis@. Root cause isolated with kettenis@. Link: https://marc.info/?l=openbsd-tech&m=166790845619897&w=2 ok mlarkin@ kettenis@ show more ...
# 4c0ab428	18-Sep-2022	cheloha <cheloha@openbsd.org>	timecounting: tc_reset_quality: print notice if active counter changes Give the user a hint as to what happened if they boot up and the TSC is not the active counter. "sure" deraadt@
# 78156938	12-Aug-2022	cheloha <cheloha@openbsd.org>	amd64: simplify TSC synchronization testing Computing a per-CPU TSC skew value is error-prone, especially on multisocket machines and VMs. My best guess is that larger latencies appear to the curre amd64: simplify TSC synchronization testing Computing a per-CPU TSC skew value is error-prone, especially on multisocket machines and VMs. My best guess is that larger latencies appear to the current skew measurement test as TSC desync, and so the TSC is demoted to a kernel timecounter on these machines or marked non-monotonic. This patch eliminates per-CPU TSC skew values. Instead of trying to measure and correct for TSC desync we only try to detect desync, which is less error-prone. This approach should allow a wider variety of machines to use the TSC as a timecounter when running OpenBSD. In the new sync test, both CPUs repeatedly try to detect whether their TSC is trailing the other CPU's TSC. The upside to this approach is that it yields no false positives. The downside to this approach is that it takes more time than the current skew measurement test. Each test round takes 1ms, and we run up to two rounds per CPU, so this patch slows boot down by 2ms per AP. If any CPU fails the sync test, the TSC is marked non-monotonic and a different timecounter is activated. The TC_USER flag remains intact. There is no middle ground where we fall back to only using the TSC in the kernel. Before running the test, we check for the IA32_TSC_ADJUST register and reset it if necessary. This is a trivial way to work around firmware bugs that desync the TSC before we reach the kernel. Unfortunately, at the moment this register appears to only be available on Intel processors. I cannot find an equivalent but differently-named MSR for AMD processors. Because there is no per-CPU skew value, there is also no concept of TSC drift anymore. Miscellaneous notes: - This patch adds a new timecounter utility function, tc_reset_quality(). Used after sync test failure to mark the TSC non-monotonic. - I have left TSC_DEBUG enabled for now. Unsure if we should leave it enabled for release or not. If we disable it we no longer run the sync test after failing it once. Running the test even after failure provides information about the desync on every CPU. - Taking 1ms per test round is fairly conservative. We can experiment with and discuss shorter test rounds. My main goal with a relatively long test round is ensuring VMs actually run the test. It would be bad if a hypervisor interrupted the test for so long that it concealed desync. - The use of two test rounds is mostly a diagnostic tool: it would be very strange if a CPU passed the first round but failed the second. If we ever saw this in the wild it would indicate something odd. - Most of the desync seen in test reports is on Ryzen CPUs. I believe, but cannot prove, that this is due to a widespread firmware bug on AMD motherboards. Hopefully AMD and/or the downstream vendors fix it. - Fixing TSC desync by writing the TSC directly with WRMSR is very difficult. The TSC is a moving target incrementing very quickly and compensating for WRMSR overhead is non-trivial. We can experiment with this, but my confidence is low that we can make it work reliably. Prompted by deraadt@ and kettenis@ in 2021. Shepherded along by deraadt@ throughout. Reprompted by Yuichiro Naito several times. With input from Yuichiro Naito, naddy@, sthen@, dv@, and deraadt@. Tested by florian@, gnezdo@, sthen@, Josh Rickmar, dv@, Mohamed Aslan, Hrvoje Popovski, Yuichiro Naito, semarie@, mlarkin@, asou@, jmatthew@, Renato Aguiar, and Timo Myyra. Patch v1: https://marc.info/?l=openbsd-tech&m=164330092208035&w=2 Patch v2: https://marc.info/?l=openbsd-tech&m=164558519712957&w=2 Patch v3: https://marc.info/?l=openbsd-tech&m=165698681018991&w=2 Patch v4: https://marc.info/?l=openbsd-tech&m=165835507113680&w=2 Patch v5: https://marc.info/?l=openbsd-tech&m=165923705118770&w=2 "just commit it" deraadt@ show more ...
# 83dc7839	23-Jul-2022	cheloha <cheloha@openbsd.org>	timecounting: use full 96-bit product when computing elapsed time The timecounting subsystem computes elapsed time by scaling (64 bits) the difference between two counter values (32 bits at most) up timecounting: use full 96-bit product when computing elapsed time The timecounting subsystem computes elapsed time by scaling (64 bits) the difference between two counter values (32 bits at most) up into a struct bintime (128 bits). Under normal circumstances it is sufficient to do this with 64-bit multiplication, like this: struct bintime bt; bt.sec = 0; bt.frac = th->tc_scale * tc_delta(th); However, if tc_delta() exceeds 1 second's worth of counter ticks, that multiplication overflows. The result is that the monotonic clock appears to jump backwards. When can this happen? In practice, I have seen it when trying to compile LLVM on an EdgeRouter Lite when using an SD card as the backing disk. The box gets stuck in swap, the hardclock(9) is delayed, and we appear to "lose time". To avoid this overflow we need to compute the full 96-bit product of the delta and the scale. This commit adds TIMECOUNT_TO_BINTIME(), a function for computing that full product, to sys/time.h. The patch puts the new function to use in lib/libc/sys/microtime.c and sys/kern/kern_tc.c. (The commit also reorganizes some of our high resolution bintime code so that we always read the timecounter first.) Doing the full 96-bit multiplication is between 0% and 15% slower than doing the cheaper 64-bit multiplication on amd64. Measuring a precise difference is extremely difficult because the computation is already quite fast. I would guess that the cost is slightly higher than that on 32-bit platforms. Nobody ever volunteered to test, so this remains a guess. Thread: https://marc.info/?l=openbsd-tech&m=163424607918042&w=2 6 month bump: https://marc.info/?l=openbsd-tech&m=165124251401342&w=2 Committed after 9 months without review. show more ...
# da571ddd	24-Oct-2021	jsg <jsg@openbsd.org>	use NULL not 0 for pointer values in kern ok semarie@
# 2f582782	19-Jun-2021	cheloha <cheloha@openbsd.org>	timecounting: add FRAC_TO_NSEC(), BINTIME_TO_NSEC() Refactor the fraction-to-nanosecond conversion from BINTIME_TO_TIMESPEC() into a dedicated routine, FRAC_TO_NSEC(), so we can reuse it elsewhere. timecounting: add FRAC_TO_NSEC(), BINTIME_TO_NSEC() Refactor the fraction-to-nanosecond conversion from BINTIME_TO_TIMESPEC() into a dedicated routine, FRAC_TO_NSEC(), so we can reuse it elsewhere. Then add a new BINTIME_TO_NSEC() function to sys/time.h to deduplicate conversion code in nsecuptime(), getnsecuptime(), and tc_setclock(). Thread: https://marc.info/?l=openbsd-tech&m=162376993926751&w=2 ok dlg@ show more ...
# 4ea72498	15-Jun-2021	dlg <dlg@openbsd.org>	factor out nsecuptime and getnsecuptime. these functions were implemented in a bunch of places with comments saying it should be moved to kern_tc.c when more pop up, and i was about to add another o factor out nsecuptime and getnsecuptime. these functions were implemented in a bunch of places with comments saying it should be moved to kern_tc.c when more pop up, and i was about to add another one. i think it's time to move them to kern_tc.c. ok cheloa@ jmatthew@ show more ...
# b32486e3	30-Apr-2021	bluhm <bluhm@openbsd.org>	Rearrange the implementation of bounded sysctl. The primitive functions are sysctl_int() and sysctl_rdint(). This brings us back the 4.4BSD implementation. Then sysctl_int_bounded() builds the mag Rearrange the implementation of bounded sysctl. The primitive functions are sysctl_int() and sysctl_rdint(). This brings us back the 4.4BSD implementation. Then sysctl_int_bounded() builds the magic for range checks on top. sysctl_bounded_arr() is a wrapper around it to support multiple variables. Introduce macros that describe the meaning of the magic boundary values. Use these macros in obvious places. input and OK gnezdo@ mvs@ show more ...
# 8611d3cd	23-Feb-2021	cheloha <cheloha@openbsd.org>	timecounting: use C99-style initialization for all timecounter structs The timecounter struct is large and I think it may change in the future. Changing it later will be easier if we use C99-style timecounting: use C99-style initialization for all timecounter structs The timecounter struct is large and I think it may change in the future. Changing it later will be easier if we use C99-style initialization for all timecounter structs. It also makes reading the code a bit easier. For reasons I cannot explain, switching to C99-style initialization sometimes changes the hash of the resulting object file, even though the resulting struct should be the same. So there is a binary change here, but only sometimes. No behavior should change in either case. I can't compile-test this everywhere but I have been staring at the diff for days now and I'm relatively confident this will not break compilation. Fingers crossed. ok gnezdo@ show more ...
# e0324b5e	05-Dec-2020	gnezdo <gnezdo@openbsd.org>	Convert sysctl_tc to sysctl_bounded_arr ok gkoehler@
# 97c55bcc	16-Sep-2020	cheloha <cheloha@openbsd.org>	timecounting: provide a naptime variable for userspace via kvm_read(3) vmstat(8) uses kvm_read(3) to extract the naptime from the kernel. Problem is, I deleted `naptime' from the global namespace wh timecounting: provide a naptime variable for userspace via kvm_read(3) vmstat(8) uses kvm_read(3) to extract the naptime from the kernel. Problem is, I deleted `naptime' from the global namespace when I moved it into the timehands. This patch restores it. It gets updated from tc_windup(). Only userspace should use it, and only when the kernel is dead. We need to tweak a variable in tc_setclock() to avoid shadowing the (once again) global naptime. show more ...
# 6e581dd8	20-Jul-2020	deraadt <deraadt@openbsd.org>	ramdisks got broken by that last diff.
# 04cecb01	20-Jul-2020	cheloha <cheloha@openbsd.org>	timecounting: add missing mutex assertion to tc_update_timekeep()
# 1fb8cdb7	20-Jul-2020	cheloha <cheloha@openbsd.org>	timecounting: misc. cleanup in tc_setclock() and tc_setrealtimeclock() - Use real variable names like "utc" and "uptime" instead of non-names like "bt" and "bt2" - Move the TIMESPEC_TO_BINTIME(9) timecounting: misc. cleanup in tc_setclock() and tc_setrealtimeclock() - Use real variable names like "utc" and "uptime" instead of non-names like "bt" and "bt2" - Move the TIMESPEC_TO_BINTIME(9) conversions out of the critical section - Sprinkle in a little whitespace - Sort automatic variables according to style(9) show more ...
# 1988fbea	19-Jul-2020	cheloha <cheloha@openbsd.org>	tc_windup(): remove misleading comment about getmicrotime(9) Using getmicrotime(9) or getnanotime(9) is perfectly appropriate in certain contexts. The programmer needs to weigh the overhead savings tc_windup(): remove misleading comment about getmicrotime(9) Using getmicrotime(9) or getnanotime(9) is perfectly appropriate in certain contexts. The programmer needs to weigh the overhead savings against the reduced accuracy and decide whether the low-res interfaces are appropriate. show more ...
# 63cc33c4	17-Jul-2020	gkoehler <gkoehler@openbsd.org>	Read ogen from the other timehands; fixes tk_generation If th0.th_generation == th1.th_generation when we update the user timekeep page, then tk_generation doesn't change, so libc may calculate the Read ogen from the other timehands; fixes tk_generation If th0.th_generation == th1.th_generation when we update the user timekeep page, then tk_generation doesn't change, so libc may calculate the wrong time. Now th0 and th1 share the sequence so th0.th_generation != th1.th_generation. ok kettenis@ cheloha@ show more ...
# fecf25f8	16-Jul-2020	cheloha <cheloha@openbsd.org>	adjtime(2): distribute skew along arbitrary period on runtime clock The adjtime(2) adjustment is applied at up to 5000ppm/sec from tc_windup(). At the start of each UTC second, ntp_update_second() adjtime(2): distribute skew along arbitrary period on runtime clock The adjtime(2) adjustment is applied at up to 5000ppm/sec from tc_windup(). At the start of each UTC second, ntp_update_second() is called from tc_windup() and up to 5000ppm worth of skew is deducted from the timehands' th_adjtimedelta member and moved to the th_adjustment member. The resulting th_adjustment value is then mixed into the th_scale member and thus the system UTC time is slowly nudged in a particular direction. This works pretty well. The only issues have to do with the use of the the edge of the UTC second as the start of the ntp_update_second() period: 1. If the UTC clock jumps forward we can get stuck in a loop calling ntp_update_second() from tc_windup(). We work around this with a magic number, LARGE_STEP. If the UTC clock jumps forward more than LARGE_STEP seconds we truncate the number of iterations to 2. Per the comment in tc_windup(), we do 2 iterations instead of 1 iteration to account for a leap second we may have missed. This is an anachronism: the OpenBSD kernel does not handle leap seconds anymore. Such jumps happen during settimeofday(2), during boot when we jump the clock from zero to the RTC time, and during resume when we jump the clock to the RTC time (again). They are unavoidable. 2. Changes to adjtime(2) are applied asynchronously. For example, if you try to cancel the ongoing adjustment... struct timeval zero = { 0, 0 }; adjtime(&zero, NULL); ... it can take up to one second for the adjustment to be cancelled. In the meantime, the skew continues. This delayed application is not intuitive or documented. 3. Adjustment is deducted from th_adjtimedelta across suspends of fewer than LARGE_STEP seconds, even though we do not skew the clock while we are suspended. This is unintuitive, incorrect, and undocumented. We can avoid all of these problems by applying the adjustment along an arbitrary period on the runtime clock instead of the UTC clock. 1. The runtime clock doesn't jump arbitrary amounts, so we never get stuck in a loop and we don't need a magic number to test for this possibility. With the removal of the magic number LARGE_STEP we can also remove the leap second handling from the tc_windup() code. 2. With a new timehands member, th_next_ntp_update, we can track when the next ntp_update_second() call should happen on the runtime clock. This value can be updated during the adjtime(2) system call, so changes to the skew happen immediately instead of up to one second after the adjtime(2) call. 3. The runtime clock does not jump across a suspend: no skew is deducted from th_adjtimedelta for any time we are offline and unable to adjust the clock. otto@ says the use of the runtime clock should not be a problem for ntpd(8) or the NTP algorithm in general. show more ...
# d82e6535	06-Jul-2020	pirofti <pirofti@openbsd.org>	Add support for timeconting in userland. This diff exposes parts of clock_gettime(2) and gettimeofday(2) to userland via libc eliberating processes from the need for a context switch everytime they Add support for timeconting in userland. This diff exposes parts of clock_gettime(2) and gettimeofday(2) to userland via libc eliberating processes from the need for a context switch everytime they want to count the passage of time. If a timecounter clock can be exposed to userland than it needs to set its tc_user member to a non-zero value. Tested with one or multiple counters per architecture. The timing data is shared through a pointer found in the new ELF auxiliary vector AUX_openbsd_timekeep containing timehands information that is frequently updated by the kernel. Timing differences between the last kernel update and the current time are adjusted in userland by the tc_get_timecount() function inside the MD usertc.c file. This permits a much more responsive environment, quite visible in browsers, office programs and gaming (apparently one is are able to fly in Minecraft now). Tested by robert@, sthen@, naddy@, kmos@, phessler@, and many others! OK from at least kettenis@, cheloha@, naddy@, sthen@ show more ...
# b609c616	04-Jul-2020	anton <anton@openbsd.org>	It's been agreed upon that global locks should be expressed using capital letters in locking annotations. Therefore harmonize the existing annotations. Also, if multiple locks are required they shou It's been agreed upon that global locks should be expressed using capital letters in locking annotations. Therefore harmonize the existing annotations. Also, if multiple locks are required they should be delimited using commas. ok mpi@ show more ...
# e62bad27	02-Jul-2020	cheloha <cheloha@openbsd.org>	timecounting: make the dummy counter interrupt- and MP-safe The dummy counter should be deterministic with respect to interrupts and multiple threads of execution.
# 0d88cff5	26-Jun-2020	cheloha <cheloha@openbsd.org>	timecounting: deprecate time_second(9), time_uptime(9) time_second(9) has been replaced in the kernel by gettime(9). time_uptime(9) has been replaced in the kernel by getuptime(9). New code should timecounting: deprecate time_second(9), time_uptime(9) time_second(9) has been replaced in the kernel by gettime(9). time_uptime(9) has been replaced in the kernel by getuptime(9). New code should use the replacement interfaces. They do not suffer from the split-read problem inherent to the time_* variables on 32-bit platforms. The variables remain in sys/kern/kern_tc.c for use via kvm(3) when examining kernel core dumps. This commit completes the deprecation process: - Remove the extern'd definitions for time_second and time_uptime from sys/time.h. - Replace manpage cross-references to time_second(9)/time_uptime(9) with references to microtime(9) or a related interface. - Move the time_second.9 manpage to the attic. With input from dlg@, kettenis@, visa@, and tedu@. ok kettenis@ show more ...
12 3 4