History log of /openbsd/sys/arch/arm64/include/cpu.h (Results 1 – 25 of 50)
Revision Date Author Comments
# e8331b74 24-Jul-2024 kettenis <kettenis@openbsd.org>

If the CPU cores implement FEAT_IDST, emulate access to the CPU ID
registers from userland and set HWCAP_CPUID. This will allow detection
of features to be introduced into the architecture in the fu

If the CPU cores implement FEAT_IDST, emulate access to the CPU ID
registers from userland and set HWCAP_CPUID. This will allow detection
of features to be introduced into the architecture in the future without
allocating new HWCAP_xxx or HWCAP2_xxx bits. We provide the same
sanitized view of the CPU ID registers as is currently available through
sysctl(2).

Note that this introduces an unconditional read of ID_AA64MMFR2_EL1. This
is known to cause problems on older versions of QEMU. If this turns out
to be a problem in cases where updating QEMU is not an option, we'll have
to implement a workaround.

Also note that since we don't emulate the CPU ID registers on older core,
this means that microarchitectural optimizations keyed of reads of MIDR_EL1
are not possible on OpenBSD. I don't think that is a real problem.

ok jca@

show more ...


# aeddddc8 17-Jul-2024 kettenis <kettenis@openbsd.org>

Clean up the cpi_id_aa64xxx variables at the end of autoconf such that
sysclt(2) and ID register access emulation can share the variables.

ok jca@


# 07eca602 10-Jul-2024 kettenis <kettenis@openbsd.org>

Implement support for deeper idle states offered by PSCI. Reduces the
idle power usage of the Vivobook S15 by almost 50%.

ok patrick@


# 82673a18 01-May-2024 mpi <mpi@openbsd.org>

Add per-CPU caches to the pmemrange allocator.

The caches are used primarily to reduce contention on uvm_lock_fpageq() during
concurrent page faults. For the moment only uvm_pagealloc() tries to ge

Add per-CPU caches to the pmemrange allocator.

The caches are used primarily to reduce contention on uvm_lock_fpageq() during
concurrent page faults. For the moment only uvm_pagealloc() tries to get a
page from the current CPU's cache. So on some architectures the caches are
also used by the pmap layer.

Each cache is composed of two magazines, design is borrowed from jeff bonwick
vmem's paper and the implementation is similar to the one of pool_cache from
dlg@. However there is no depot layer and magazines are refilled directly by
the pmemrange allocator.

This version includes splvm()/splx() dances because the buffer cache flips
buffers in interrupt context. So we have to prevent recursive accesses to
per-CPU magazines.

Tested by naddy@, solene@, krw@, robert@, claudio@ and Laurence Tratt.

ok claudio@, kettenis@

show more ...


# 08c42a48 29-Apr-2024 jsg <jsg@openbsd.org>

remove prototypes for removed functions


# 097a266d 19-Apr-2024 mpi <mpi@openbsd.org>

Revert per-CPU caches a double-free has been found by naddy@.


# 52feabc5 17-Apr-2024 mpi <mpi@openbsd.org>

Add per-CPU caches to the pmemrange allocator.

The caches are used primarily to reduce contention on uvm_lock_fpageq() during
concurrent page faults. For the moment only uvm_pagealloc() tries to ge

Add per-CPU caches to the pmemrange allocator.

The caches are used primarily to reduce contention on uvm_lock_fpageq() during
concurrent page faults. For the moment only uvm_pagealloc() tries to get a
page from the current CPU's cache. So on some architectures the caches are
also used by the pmap layer.

Each cache is composed of two magazines, design is borrowed from jeff bonwick
vmem's paper and the implementation is similar to the one of pool_cache from
dlg@. However there is no depot layer and magazines are refilled directly by
the pmemrange allocator.

Tested by robert@, claudio@ and Laurence Tratt.

ok kettenis@

show more ...


# c737cf90 25-Feb-2024 cheloha <cheloha@openbsd.org>

clockintr: rename "struct clockintr_queue" to "struct clockqueue"

The code has outgrown the original name for this struct. Both the
external and internal APIs have used the "clockqueue" namespace f

clockintr: rename "struct clockintr_queue" to "struct clockqueue"

The code has outgrown the original name for this struct. Both the
external and internal APIs have used the "clockqueue" namespace for
some time when operating on it, and that name is eyeball-consistent
with "clockintr" and "clockrequest", so "clockqueue" it is.

show more ...


# 1d970828 24-Jan-2024 cheloha <cheloha@openbsd.org>

clockintr: switch from callee- to caller-allocated clockintr structs

Currently, clockintr_establish() calls malloc(9) to allocate a
clockintr struct on behalf of the caller. mpi@ says this behavior

clockintr: switch from callee- to caller-allocated clockintr structs

Currently, clockintr_establish() calls malloc(9) to allocate a
clockintr struct on behalf of the caller. mpi@ says this behavior is
incompatible with dt(4). In particular, calling malloc(9) during the
initialization of a PCB outside of dt_pcb_alloc() is (a) awkward and
(b) may conflict with future changes/optimizations to PCB allocation.

To side-step the problem, this patch changes the clockintr subsystem
to use caller-allocated clockintr structs instead of callee-allocated
structs.

clockintr_establish() is named after softintr_establish(), which uses
malloc(9) internally to create softintr objects. The clockintr subsystem
is no longer using malloc(9), so the "establish" naming is no longer apt.
To avoid confusion, this patch also renames "clockintr_establish" to
"clockintr_bind".

Requested by mpi@. Tweaked by mpi@.

Thread: https://marc.info/?l=openbsd-tech&m=170597126103504&w=2

ok claudio@ mlarkin@ mpi@

show more ...


# a84df9b3 15-Jan-2024 kettenis <kettenis@openbsd.org>

We can't call kstat_create(9) when bringing up the secondary CPUs as it
uses an rwlock and curproc isn't initialized yet for these CPUs at this
point. As a result we hit a "locking against myself" p

We can't call kstat_create(9) when bringing up the secondary CPUs as it
uses an rwlock and curproc isn't initialized yet for these CPUs at this
point. As a result we hit a "locking against myself" panic if there is
any lock contention.

Fix this by adding a new ci_midr member to struct cpu_info which gets
initialized when we identify the CPUs and use that to attach the kstat
stuff.

ok tobhe@, dlg@

show more ...


# 4ef70b62 26-Dec-2023 kettenis <kettenis@openbsd.org>

Improve handling of SError interrupts. Print some useful information and
allow additional information to be printed for specific CPU types. Use
this to print the L2C registers on Apple CPUs which c

Improve handling of SError interrupts. Print some useful information and
allow additional information to be printed for specific CPU types. Use
this to print the L2C registers on Apple CPUs which can be very useful
in tracking down the source of certain SError interrupts.

ok miod@, dlg@

show more ...


# 11d1f9b2 23-Aug-2023 cheloha <cheloha@openbsd.org>

all platforms: separate cpu_initclocks() from cpu_startclock()

To give the primary CPU an opportunity to perform clock interrupt
preparation in a machine-independent manner we need to separate the
"

all platforms: separate cpu_initclocks() from cpu_startclock()

To give the primary CPU an opportunity to perform clock interrupt
preparation in a machine-independent manner we need to separate the
"initialization" parts of cpu_initclocks() from the "start the clock
interrupt" parts. Currently, cpu_initclocks() does everything all at
once, so there is no space for this MI setup.

Many platforms have more-or-less already done this separation by
implementing a separate routine named "cpu_startclock()". This patch
promotes cpu_startclock() from de facto standard to mandatory API.

- Prototype cpu_startclock() in sys/systm.h alongside cpu_initclocks().
The separation of responsibility between the two routines is a bit
fuzzy but the basic guidelines are as follows:

+ cpu_initclocks() must initialize hz, stathz, and profhz, and call
clockintr_init().

+ cpu_startclock() must call clockintr_cpu_init() and start the clock
interrupt cycle on the calling CPU.

These guidelines will shift in the future, but that's the way things
stand as of *this* commit.

- In initclocks(): first call cpu_initclocks(), then do MI setup, and
last call cpu_startclock().

- On platforms where cpu_startclock() already exists: don't call
cpu_startclock() from cpu_initclocks() anymore.

- On platforms where cpu_startclock() doesn't yet exist: implement it.
Usually this is as simple as dividing cpu_initclocks() in two.

Tested on amd64 (i8254, lapic), arm64, i386 (i8254, lapic), macppc,
mips64/octeon, and sparc64. Tested on arm/armv7 (agtimer(4)) by
phessler@ and jmatthew@. Tested on m88k/luna88k by aoyama@. Tested
on powerpc64 by gkoehler@ and mlarkin@. Tested on riscv64 by
jmatthew@.

Thread: https://marc.info/?l=openbsd-tech&m=169195251322149&w=2

show more ...


# 671537bf 25-Jul-2023 cheloha <cheloha@openbsd.org>

statclock: move profil(2), GPROF code to profclock(), gmonclock()

This patch isolates profil(2) and GPROF from statclock(). Currently,
statclock() implements both profil(2) and GPROF through a comp

statclock: move profil(2), GPROF code to profclock(), gmonclock()

This patch isolates profil(2) and GPROF from statclock(). Currently,
statclock() implements both profil(2) and GPROF through a complex
mechanism involving both platform code (setstatclockrate) and the
scheduler (pscnt, psdiv, and psratio). We have a machine-independent
interface to the clock interrupt hardware now, so we no longer need to
do it this way.

- Move profil(2)-specific code from statclock() to a new clock
interrupt callback, profclock(), in subr_prof.c. Each
schedstate_percpu has its own profclock handle. The profclock is
enabled/disabled for a given CPU when it is needed by the running
thread during mi_switch() and sched_exit().

- Move GPROF-specific code from statclock() to a new clock interrupt
callback, gmonclock(), in subr_prof.c. Where available, each cpu_info
has its own gmonclock handle . The gmonclock is enabled/disabled for
a given CPU via sysctl(2) in prof_state_toggle().

- Both profclock() and gmonclock() have a fixed period, profclock_period,
that is initialized during initclocks().

- Export clockintr_advance(), clockintr_cancel(), clockintr_establish(),
and clockintr_stagger() via <sys/clockintr.h>. They have external
callers now.

- Delete pscnt, psdiv, psratio. From schedstate_percpu, also delete
spc_pscnt and spc_psdiv. The statclock frequency is not dynamic
anymore so these variables are now useless.

- Delete code/state related to the dynamic statclock frequency from
kern_clockintr.c. The statclock frequency can still be pseudo-random,
so move the contents of clockintr_statvar_init() into clockintr_init().

With input from miod@, deraadt@, and claudio@. Early revisions
cleaned up by claudio. Early revisions tested by claudio@. Tested by
cheloha@ on amd64, arm64, macppc, octeon, and sparc64 (sun4v).
Compile- and boot- tested on i386 by mlarkin@. riscv64 compilation
bugs found by mlarkin@. Tested on riscv64 by jca@. Tested on
powerpc64 by gkoehler@.

show more ...


# c4936e80 13-Jul-2023 kettenis <kettenis@openbsd.org>

Use the deep idle state available on Apple M1/M2 cores in the idle loop and
for suspend. This state makes the CPU lose some of its register state so
we need to save these registers before putting th

Use the deep idle state available on Apple M1/M2 cores in the idle loop and
for suspend. This state makes the CPU lose some of its register state so
we need to save these registers before putting the core to sleep and
restore them when we wake up. This deep idle state has a higher wakeup
latency than the normal WFI idle state. Use similar logic as acpucpu(4) to
decide which idle state to pick.

If some cores of a cluster are in this deep idle state, turbo states become
available to the cores that remain active. So stop skipping these states.
This improves single-core performance a little bit.

The main win is in power savings when running in a state with a high clock
frequency. My M2 Pro mini goes from 14W to 6.5W when idle at the maximum
clock frequency. But event at the lowest clock frequency there are small
but significant power savings.

ok deraadt@, tobhe@

show more ...


# 4171e492 10-Jun-2023 kettenis <kettenis@openbsd.org>

Implement support for pointer authentication (PAC) in userland. With PAC
it is possible to "sign" pointers with a hidden key. The signature is
placed in unused bits of the pointer and can be checke

Implement support for pointer authentication (PAC) in userland. With PAC
it is possible to "sign" pointers with a hidden key. The signature is
placed in unused bits of the pointer and can be checked later. This can
be used to provide "tail CFI" that is similar to what retguard provides.

Debuggers need to be aware of the fact that pointers can be signed. For
this purpose a new PT_PACMASK ptrace(2) request is introduced that returns
as mask that indicates the bits used for the signature. Separate masks
are provided for code and data pointers even though the masks are identical
in the current implementation. These masks are also written into a special
note section in the core dump.

ok patrick@

show more ...


# 5f4ad52d 19-Feb-2023 kettenis <kettenis@openbsd.org>

Add support for deep(er) idle states that can be entered using PSCI. For
now this only supports states advertised in device trees, but ACPI support
could be added as well. The parsing of the idle s

Add support for deep(er) idle states that can be entered using PSCI. For
now this only supports states advertised in device trees, but ACPI support
could be added as well. The parsing of the idle states as well as the
heuristic to pick the deepest one is probably a bit to simple, but more
complex cases can be added later. Worst case cores will use WFI and use
more power in suspend.

ok phessler@

show more ...


# 6dd70ce4 31-Dec-2022 patrick <patrick@openbsd.org>

Add machdep.lidaction to machdep names list.

ok mpi@


# 16569a75 10-Dec-2022 patrick <patrick@openbsd.org>

Mitigate Spectre-BHB by using core-specific trampoline vectors. On some cores
Spectre-BHB can be mitigated by using a loop that branches a number of times.
For cores where this does not suffice, or

Mitigate Spectre-BHB by using core-specific trampoline vectors. On some cores
Spectre-BHB can be mitigated by using a loop that branches a number of times.
For cores where this does not suffice, or where Spectre-V2 needs to be handled
as well, try and call into a new PSCI function that mitigates both Spectre-V2
and Spectre-BHB. Some newer machines, which might not be in anyone's hands
yet, have an instruction (CLRBHB) that clears the BHB. If ECBHB is set, the
BHB isn't vulnerable. If we have CSV2_3/HCXT, it's not vulnerable at all.

No visible performance dropoff on a MacchiatoBin (4xA72) or Lenovo x13s (4xA78C+
4xX1C), but around 2-3% on a LX2K (16xA72) and RK3399 (4xA53+2xA72).

ok kettenis@

show more ...


# ec9fe3b7 26-Nov-2022 tobhe <tobhe@openbsd.org>

Add arm64 lid_action sysctl for Apple Silicon laptops.

ok kettenis@


# c7654cd6 24-Nov-2022 kettenis <kettenis@openbsd.org>

Expose the complete set of ID registers as defined in the current version
of ARMv8/ARMv9. Make sure we only expose the features that we know about
and support in our kernel. This matches what Linux

Expose the complete set of ID registers as defined in the current version
of ARMv8/ARMv9. Make sure we only expose the features that we know about
and support in our kernel. This matches what Linux does. For now, mostly
restrict ourselves to features defined in ARMv8.5 which means that we only
actually implement support for ID_AA64ISAR0_EL1, ID_AA64ISAR1_EL1,
ID_AA64PFR0_EL1 and ID_AA64PFR1_EL1. For the other registers we simply
always return 0.

ok deraadt@

show more ...


# da1f63bf 08-Nov-2022 mlarkin <mlarkin@openbsd.org>

KNF/whitespace - no code change


# e3d8572a 08-Nov-2022 cheloha <cheloha@openbsd.org>

arm64: switch to clockintr(9)

Switch arm64 to the clockintr(9) subsystem.

- Remove the custom per-CPU clock interrupt schedule from agtimer(4).
- Remove the custom randomized statclock() pieces fro

arm64: switch to clockintr(9)

Switch arm64 to the clockintr(9) subsystem.

- Remove the custom per-CPU clock interrupt schedule from agtimer(4).
- Remove the custom randomized statclock() pieces from agtimer(4).
- Add agtimer_rearm(), agtimer_trigger(), and wire up agtimer_intrclock.

There is one wart:

- The AArch64 spec says that a value written to CNTV_TVAL_EL0 is
"treated as a signed 32-bit integer" [1]. kettenis@ doesn't know
what to make of this. I'm capping the value at INT32_MAX for
now. It's possible I am misreading this, though.

Tested by kettenis@ on his Apple M1 mini. Tested by me on my
Raspberry Pi 4B.

Link: https://marc.info/?l=openbsd-tech&m=166776342503304&w=2

[1] "Arm Architecture Reference Manual for A-profile architecture"
issue I.a, section D17.11.27 ("CNTV_TVAL_EL0").

ok kettenis@

show more ...


# dd81489d 29-Aug-2022 jsg <jsg@openbsd.org>

use ansi volatile keyword, not __volatile
ok miod@ guenther@


# 4002e08d 13-Jul-2022 kettenis <kettenis@openbsd.org>

Implement the fundamentals for suspend/resume on arm64. This uses PSCI
to turn off the secondary CPUs and suspend the primary CPU using the
CPU_OFF and SYSTEM_SUSPEND calls. A new "halt" IPI is add

Implement the fundamentals for suspend/resume on arm64. This uses PSCI
to turn off the secondary CPUs and suspend the primary CPU using the
CPU_OFF and SYSTEM_SUSPEND calls. A new "halt" IPI is added to turn off
the ssecondary CPUs. This IPI is implemented for the ampintc(4) and
agintc(4) interrupt controllers. Fulle suspend/resume support is only
implemented for ampintc(4). This is enough to suspend and resume boards
based on the Allwinner A64 SoC, provided the necessary wakeup interrupts
have been set up (not part of this commit).

ok patrick@

show more ...


# d8d35dc4 16-Jun-2022 kettenis <kettenis@openbsd.org>

Bump MAXCPUS to 256, which is the maximum number of cores on a dual socket
machine with Ampere Altra Max CPUs. OpenBSD should run on such a machine
now.

ok patrick@, deraadt@


12