History log of /openbsd/sys/kern/kern_tc.c (Results 1 – 25 of 83)
Revision Date Author Comments
# 127fa8d5 23-Feb-2024 cheloha <cheloha@openbsd.org>

timecounting: start system uptime at 0.0 instead of 1.0

OpenBSD starts the system uptime clock at 1.0 instead of 0.0. We
inherited this behavior from FreeBSD when we imported kern_tc.c.

patrick@ r

timecounting: start system uptime at 0.0 instead of 1.0

OpenBSD starts the system uptime clock at 1.0 instead of 0.0. We
inherited this behavior from FreeBSD when we imported kern_tc.c.

patrick@ reports that this causes a problem in sdmmc(4) during boot:
the sdmmc_delay() call in sdmmc_init() doesn't block for the full
250ms. This happens because the system hardclock() starts at 0.0 and
executes about hz times, rapidly, to "catch up" to 1.0. This
instantly expires the first hz timeout ticks, hence the short sleep.

Starting the system uptime at 0.0 fixes the problem.

Prompted by patrick@. Tested by patrick@. In snaps since Feb 19 2023.

Thread: https://marc.info/?l=openbsd-tech&m=170830229732396&w=2

ok patrick@ deraadt@

show more ...


# 24ee467d 04-Feb-2023 cheloha <cheloha@openbsd.org>

timecounting: remove incomplete PPS support

The timecounting code has had stubs for pulse-per-second (PPS) polling
since it was imported in 2004. At this point it seems unlikely that
anyone is goin

timecounting: remove incomplete PPS support

The timecounting code has had stubs for pulse-per-second (PPS) polling
since it was imported in 2004. At this point it seems unlikely that
anyone is going to finish adding PPS support, so let's remove the stubs:

- Delete the dead tc_poll_pps() call from tc_windup().
- Remove all tc_poll_pps symbols from the kernel.

Link: https://marc.info/?l=openbsd-tech&m=167519035723210&w=2

ok miod@

show more ...


# 6fba7c69 13-Dec-2022 cheloha <cheloha@openbsd.org>

timecounting: add getbinruntime(), getnsecruntime()

The networking people want a fast, monotonic clock that only advances
while the system is not suspended. The runtime clock satisfies most
of thes

timecounting: add getbinruntime(), getnsecruntime()

The networking people want a fast, monotonic clock that only advances
while the system is not suspended. The runtime clock satisfies most
of these requirements, so introduce getnsecruntime() to provide a fast
means for reading it.

Based on patches from jca@ and claudio@.

ok yasuoka@

show more ...


# 2b46a8cb 05-Dec-2022 deraadt <deraadt@openbsd.org>

zap a pile of dangling tabs


# a6064e19 08-Nov-2022 cheloha <cheloha@openbsd.org>

tc_setclock: don't print a warning if tc_windup() rejects inittodr(9) time

During resume, it isn't necessarily a problem if the UTC time we get
from inittodr(9) lags behind the system UTC clock. In

tc_setclock: don't print a warning if tc_windup() rejects inittodr(9) time

During resume, it isn't necessarily a problem if the UTC time we get
from inittodr(9) lags behind the system UTC clock. In particular, if
the active timecounter's frequency is low enough, tc_delta() might not
overflow across a brief suspend.

Remove the misleading warning message. The code is behaving as
intended, just not in a way I anticipated when I added the warning
message a few years ago.

Discovered by kettenis@. Root cause isolated with kettenis@.

Link: https://marc.info/?l=openbsd-tech&m=166790845619897&w=2

ok mlarkin@ kettenis@

show more ...


# 4c0ab428 18-Sep-2022 cheloha <cheloha@openbsd.org>

timecounting: tc_reset_quality: print notice if active counter changes

Give the user a hint as to what happened if they boot up and the TSC
is not the active counter.

"sure" deraadt@


# 78156938 12-Aug-2022 cheloha <cheloha@openbsd.org>

amd64: simplify TSC synchronization testing

Computing a per-CPU TSC skew value is error-prone, especially on
multisocket machines and VMs. My best guess is that larger latencies
appear to the curre

amd64: simplify TSC synchronization testing

Computing a per-CPU TSC skew value is error-prone, especially on
multisocket machines and VMs. My best guess is that larger latencies
appear to the current skew measurement test as TSC desync, and so the
TSC is demoted to a kernel timecounter on these machines or marked
non-monotonic.

This patch eliminates per-CPU TSC skew values. Instead of trying to
measure and correct for TSC desync we only try to detect desync, which
is less error-prone. This approach should allow a wider variety of
machines to use the TSC as a timecounter when running OpenBSD.

In the new sync test, both CPUs repeatedly try to detect whether their
TSC is trailing the other CPU's TSC. The upside to this approach is
that it yields no false positives. The downside to this approach is
that it takes more time than the current skew measurement test. Each
test round takes 1ms, and we run up to two rounds per CPU, so this
patch slows boot down by 2ms per AP.

If any CPU fails the sync test, the TSC is marked non-monotonic and a
different timecounter is activated. The TC_USER flag remains intact.
There is no middle ground where we fall back to only using the TSC in
the kernel.

Before running the test, we check for the IA32_TSC_ADJUST register and
reset it if necessary. This is a trivial way to work around firmware
bugs that desync the TSC before we reach the kernel. Unfortunately,
at the moment this register appears to only be available on Intel
processors. I cannot find an equivalent but differently-named MSR for
AMD processors.

Because there is no per-CPU skew value, there is also no concept of
TSC drift anymore.

Miscellaneous notes:

- This patch adds a new timecounter utility function, tc_reset_quality().
Used after sync test failure to mark the TSC non-monotonic.

- I have left TSC_DEBUG enabled for now. Unsure if we should leave it
enabled for release or not. If we disable it we no longer run the
sync test after failing it once. Running the test even after failure
provides information about the desync on every CPU.

- Taking 1ms per test round is fairly conservative. We can experiment
with and discuss shorter test rounds. My main goal with a relatively
long test round is ensuring VMs actually run the test. It would be
bad if a hypervisor interrupted the test for so long that it concealed
desync.

- The use of two test rounds is mostly a diagnostic tool: it would be
very strange if a CPU passed the first round but failed the second.
If we ever saw this in the wild it would indicate something odd.

- Most of the desync seen in test reports is on Ryzen CPUs. I
believe, but cannot prove, that this is due to a widespread
firmware bug on AMD motherboards. Hopefully AMD and/or the
downstream vendors fix it.

- Fixing TSC desync by writing the TSC directly with WRMSR is very
difficult. The TSC is a moving target incrementing very quickly and
compensating for WRMSR overhead is non-trivial. We can experiment
with this, but my confidence is low that we can make it work reliably.

Prompted by deraadt@ and kettenis@ in 2021. Shepherded along by
deraadt@ throughout. Reprompted by Yuichiro Naito several times.
With input from Yuichiro Naito, naddy@, sthen@, dv@, and deraadt@.

Tested by florian@, gnezdo@, sthen@, Josh Rickmar, dv@, Mohamed Aslan,
Hrvoje Popovski, Yuichiro Naito, semarie@, mlarkin@, asou@, jmatthew@,
Renato Aguiar, and Timo Myyra.

Patch v1: https://marc.info/?l=openbsd-tech&m=164330092208035&w=2
Patch v2: https://marc.info/?l=openbsd-tech&m=164558519712957&w=2
Patch v3: https://marc.info/?l=openbsd-tech&m=165698681018991&w=2
Patch v4: https://marc.info/?l=openbsd-tech&m=165835507113680&w=2
Patch v5: https://marc.info/?l=openbsd-tech&m=165923705118770&w=2

"just commit it" deraadt@

show more ...


# 83dc7839 23-Jul-2022 cheloha <cheloha@openbsd.org>

timecounting: use full 96-bit product when computing elapsed time

The timecounting subsystem computes elapsed time by scaling (64 bits)
the difference between two counter values (32 bits at most) up

timecounting: use full 96-bit product when computing elapsed time

The timecounting subsystem computes elapsed time by scaling (64 bits)
the difference between two counter values (32 bits at most) up into a
struct bintime (128 bits).

Under normal circumstances it is sufficient to do this with 64-bit
multiplication, like this:

struct bintime bt;

bt.sec = 0;
bt.frac = th->tc_scale * tc_delta(th);

However, if tc_delta() exceeds 1 second's worth of counter ticks, that
multiplication overflows. The result is that the monotonic clock appears
to jump backwards.

When can this happen? In practice, I have seen it when trying to
compile LLVM on an EdgeRouter Lite when using an SD card as the
backing disk. The box gets stuck in swap, the hardclock(9) is
delayed, and we appear to "lose time".

To avoid this overflow we need to compute the full 96-bit product of
the delta and the scale.

This commit adds TIMECOUNT_TO_BINTIME(), a function for computing that
full product, to sys/time.h. The patch puts the new function to use
in lib/libc/sys/microtime.c and sys/kern/kern_tc.c.

(The commit also reorganizes some of our high resolution bintime code
so that we always read the timecounter first.)

Doing the full 96-bit multiplication is between 0% and 15% slower than
doing the cheaper 64-bit multiplication on amd64. Measuring a precise
difference is extremely difficult because the computation is already
quite fast.

I would guess that the cost is slightly higher than that on 32-bit
platforms. Nobody ever volunteered to test, so this remains a guess.

Thread: https://marc.info/?l=openbsd-tech&m=163424607918042&w=2
6 month bump: https://marc.info/?l=openbsd-tech&m=165124251401342&w=2

Committed after 9 months without review.

show more ...


# da571ddd 24-Oct-2021 jsg <jsg@openbsd.org>

use NULL not 0 for pointer values in kern
ok semarie@


# 2f582782 19-Jun-2021 cheloha <cheloha@openbsd.org>

timecounting: add FRAC_TO_NSEC(), BINTIME_TO_NSEC()

Refactor the fraction-to-nanosecond conversion from BINTIME_TO_TIMESPEC()
into a dedicated routine, FRAC_TO_NSEC(), so we can reuse it elsewhere.

timecounting: add FRAC_TO_NSEC(), BINTIME_TO_NSEC()

Refactor the fraction-to-nanosecond conversion from BINTIME_TO_TIMESPEC()
into a dedicated routine, FRAC_TO_NSEC(), so we can reuse it elsewhere.

Then add a new BINTIME_TO_NSEC() function to sys/time.h to deduplicate
conversion code in nsecuptime(), getnsecuptime(), and tc_setclock().

Thread: https://marc.info/?l=openbsd-tech&m=162376993926751&w=2

ok dlg@

show more ...


# 4ea72498 15-Jun-2021 dlg <dlg@openbsd.org>

factor out nsecuptime and getnsecuptime.

these functions were implemented in a bunch of places with comments
saying it should be moved to kern_tc.c when more pop up, and i was
about to add another o

factor out nsecuptime and getnsecuptime.

these functions were implemented in a bunch of places with comments
saying it should be moved to kern_tc.c when more pop up, and i was
about to add another one. i think it's time to move them to kern_tc.c.

ok cheloa@ jmatthew@

show more ...


# b32486e3 30-Apr-2021 bluhm <bluhm@openbsd.org>

Rearrange the implementation of bounded sysctl. The primitive
functions are sysctl_int() and sysctl_rdint(). This brings us back
the 4.4BSD implementation. Then sysctl_int_bounded() builds the
mag

Rearrange the implementation of bounded sysctl. The primitive
functions are sysctl_int() and sysctl_rdint(). This brings us back
the 4.4BSD implementation. Then sysctl_int_bounded() builds the
magic for range checks on top. sysctl_bounded_arr() is a wrapper
around it to support multiple variables.
Introduce macros that describe the meaning of the magic boundary
values. Use these macros in obvious places.
input and OK gnezdo@ mvs@

show more ...


# 8611d3cd 23-Feb-2021 cheloha <cheloha@openbsd.org>

timecounting: use C99-style initialization for all timecounter structs

The timecounter struct is large and I think it may change in the
future. Changing it later will be easier if we use C99-style

timecounting: use C99-style initialization for all timecounter structs

The timecounter struct is large and I think it may change in the
future. Changing it later will be easier if we use C99-style
initialization for all timecounter structs. It also makes reading the
code a bit easier.

For reasons I cannot explain, switching to C99-style initialization
sometimes changes the hash of the resulting object file, even though
the resulting struct should be the same. So there is a binary change
here, but only sometimes. No behavior should change in either case.

I can't compile-test this everywhere but I have been staring at the
diff for days now and I'm relatively confident this will not break
compilation. Fingers crossed.

ok gnezdo@

show more ...


# e0324b5e 05-Dec-2020 gnezdo <gnezdo@openbsd.org>

Convert sysctl_tc to sysctl_bounded_arr

ok gkoehler@


# 97c55bcc 16-Sep-2020 cheloha <cheloha@openbsd.org>

timecounting: provide a naptime variable for userspace via kvm_read(3)

vmstat(8) uses kvm_read(3) to extract the naptime from the kernel.
Problem is, I deleted `naptime' from the global namespace wh

timecounting: provide a naptime variable for userspace via kvm_read(3)

vmstat(8) uses kvm_read(3) to extract the naptime from the kernel.
Problem is, I deleted `naptime' from the global namespace when I moved
it into the timehands. This patch restores it. It gets updated from
tc_windup(). Only userspace should use it, and only when the kernel
is dead.

We need to tweak a variable in tc_setclock() to avoid shadowing the
(once again) global naptime.

show more ...


# 6e581dd8 20-Jul-2020 deraadt <deraadt@openbsd.org>

ramdisks got broken by that last diff.


# 04cecb01 20-Jul-2020 cheloha <cheloha@openbsd.org>

timecounting: add missing mutex assertion to tc_update_timekeep()


# 1fb8cdb7 20-Jul-2020 cheloha <cheloha@openbsd.org>

timecounting: misc. cleanup in tc_setclock() and tc_setrealtimeclock()

- Use real variable names like "utc" and "uptime" instead of non-names
like "bt" and "bt2"
- Move the TIMESPEC_TO_BINTIME(9)

timecounting: misc. cleanup in tc_setclock() and tc_setrealtimeclock()

- Use real variable names like "utc" and "uptime" instead of non-names
like "bt" and "bt2"
- Move the TIMESPEC_TO_BINTIME(9) conversions out of the critical
section
- Sprinkle in a little whitespace
- Sort automatic variables according to style(9)

show more ...


# 1988fbea 19-Jul-2020 cheloha <cheloha@openbsd.org>

tc_windup(): remove misleading comment about getmicrotime(9)

Using getmicrotime(9) or getnanotime(9) is perfectly appropriate in
certain contexts. The programmer needs to weigh the overhead savings

tc_windup(): remove misleading comment about getmicrotime(9)

Using getmicrotime(9) or getnanotime(9) is perfectly appropriate in
certain contexts. The programmer needs to weigh the overhead savings
against the reduced accuracy and decide whether the low-res interfaces
are appropriate.

show more ...


# 63cc33c4 17-Jul-2020 gkoehler <gkoehler@openbsd.org>

Read ogen from the other timehands; fixes tk_generation

If th0.th_generation == th1.th_generation when we update the user
timekeep page, then tk_generation doesn't change, so libc may
calculate the

Read ogen from the other timehands; fixes tk_generation

If th0.th_generation == th1.th_generation when we update the user
timekeep page, then tk_generation doesn't change, so libc may
calculate the wrong time. Now th0 and th1 share the sequence so
th0.th_generation != th1.th_generation.

ok kettenis@ cheloha@

show more ...


# fecf25f8 16-Jul-2020 cheloha <cheloha@openbsd.org>

adjtime(2): distribute skew along arbitrary period on runtime clock

The adjtime(2) adjustment is applied at up to 5000ppm/sec from
tc_windup(). At the start of each UTC second, ntp_update_second()

adjtime(2): distribute skew along arbitrary period on runtime clock

The adjtime(2) adjustment is applied at up to 5000ppm/sec from
tc_windup(). At the start of each UTC second, ntp_update_second() is
called from tc_windup() and up to 5000ppm worth of skew is deducted
from the timehands' th_adjtimedelta member and moved to the
th_adjustment member. The resulting th_adjustment value is then mixed
into the th_scale member and thus the system UTC time is slowly nudged
in a particular direction.

This works pretty well. The only issues have to do with the use of
the the edge of the UTC second as the start of the ntp_update_second()
period:

1. If the UTC clock jumps forward we can get stuck in a loop calling
ntp_update_second() from tc_windup(). We work around this with
a magic number, LARGE_STEP. If the UTC clock jumps forward more
than LARGE_STEP seconds we truncate the number of iterations to 2.

Per the comment in tc_windup(), we do 2 iterations instead of 1
iteration to account for a leap second we may have missed. This is
an anachronism: the OpenBSD kernel does not handle leap seconds
anymore.

Such jumps happen during settimeofday(2), during boot when we jump
the clock from zero to the RTC time, and during resume when we jump
the clock to the RTC time (again). They are unavoidable.

2. Changes to adjtime(2) are applied asynchronously. For example, if
you try to cancel the ongoing adjustment...

struct timeval zero = { 0, 0 };

adjtime(&zero, NULL);

... it can take up to one second for the adjustment to be cancelled.
In the meantime, the skew continues. This delayed application is not
intuitive or documented.

3. Adjustment is deducted from th_adjtimedelta across suspends of fewer
than LARGE_STEP seconds, even though we do not skew the clock while
we are suspended. This is unintuitive, incorrect, and undocumented.

We can avoid all of these problems by applying the adjustment along
an arbitrary period on the runtime clock instead of the UTC clock.

1. The runtime clock doesn't jump arbitrary amounts, so we never get
stuck in a loop and we don't need a magic number to test for this
possibility. With the removal of the magic number LARGE_STEP we
can also remove the leap second handling from the tc_windup() code.

2. With a new timehands member, th_next_ntp_update, we can track when
the next ntp_update_second() call should happen on the runtime clock.
This value can be updated during the adjtime(2) system call, so
changes to the skew happen *immediately* instead of up to one second
after the adjtime(2) call.

3. The runtime clock does not jump across a suspend: no skew is
deducted from th_adjtimedelta for any time we are offline and
unable to adjust the clock.

otto@ says the use of the runtime clock should not be a problem for
ntpd(8) or the NTP algorithm in general.

show more ...


# d82e6535 06-Jul-2020 pirofti <pirofti@openbsd.org>

Add support for timeconting in userland.

This diff exposes parts of clock_gettime(2) and gettimeofday(2) to
userland via libc eliberating processes from the need for a context
switch everytime they

Add support for timeconting in userland.

This diff exposes parts of clock_gettime(2) and gettimeofday(2) to
userland via libc eliberating processes from the need for a context
switch everytime they want to count the passage of time.

If a timecounter clock can be exposed to userland than it needs to set
its tc_user member to a non-zero value. Tested with one or multiple
counters per architecture.

The timing data is shared through a pointer found in the new ELF
auxiliary vector AUX_openbsd_timekeep containing timehands information
that is frequently updated by the kernel.

Timing differences between the last kernel update and the current time
are adjusted in userland by the tc_get_timecount() function inside the
MD usertc.c file.

This permits a much more responsive environment, quite visible in
browsers, office programs and gaming (apparently one is are able to fly
in Minecraft now).

Tested by robert@, sthen@, naddy@, kmos@, phessler@, and many others!

OK from at least kettenis@, cheloha@, naddy@, sthen@

show more ...


# b609c616 04-Jul-2020 anton <anton@openbsd.org>

It's been agreed upon that global locks should be expressed using
capital letters in locking annotations. Therefore harmonize the existing
annotations.

Also, if multiple locks are required they shou

It's been agreed upon that global locks should be expressed using
capital letters in locking annotations. Therefore harmonize the existing
annotations.

Also, if multiple locks are required they should be delimited using
commas.

ok mpi@

show more ...


# e62bad27 02-Jul-2020 cheloha <cheloha@openbsd.org>

timecounting: make the dummy counter interrupt- and MP-safe

The dummy counter should be deterministic with respect to interrupts
and multiple threads of execution.


# 0d88cff5 26-Jun-2020 cheloha <cheloha@openbsd.org>

timecounting: deprecate time_second(9), time_uptime(9)

time_second(9) has been replaced in the kernel by gettime(9).
time_uptime(9) has been replaced in the kernel by getuptime(9).

New code should

timecounting: deprecate time_second(9), time_uptime(9)

time_second(9) has been replaced in the kernel by gettime(9).
time_uptime(9) has been replaced in the kernel by getuptime(9).

New code should use the replacement interfaces. They do not suffer
from the split-read problem inherent to the time_* variables on 32-bit
platforms.

The variables remain in sys/kern/kern_tc.c for use via kvm(3) when
examining kernel core dumps.

This commit completes the deprecation process:

- Remove the extern'd definitions for time_second and time_uptime
from sys/time.h.
- Replace manpage cross-references to time_second(9)/time_uptime(9)
with references to microtime(9) or a related interface.
- Move the time_second.9 manpage to the attic.

With input from dlg@, kettenis@, visa@, and tedu@.

ok kettenis@

show more ...


1234