#
2b3f93ea |
| 13-Oct-2023 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add per-process capability-based restrictions
* This new system allows userland to set capability restrictions which turns off numerous kernel features and root accesses. These restricti
kernel - Add per-process capability-based restrictions
* This new system allows userland to set capability restrictions which turns off numerous kernel features and root accesses. These restrictions are inherited by sub-processes recursively. Once set, restrictions cannot be removed.
Basic restrictions that mimic an unadorned jail can be enabled without creating a jail, but generally speaking real security also requires creating a chrooted filesystem topology, and a jail is still needed to really segregate processes from each other. If you do so, however, you can (for example) disable mount/umount and most global root-only features.
* Add new system calls and a manual page for syscap_get(2) and syscap_set(2)
* Add sys/caps.h
* Add the "setcaps" userland utility and manual page.
* Remove priv.9 and the priv_check infrastructure, replacing it with a newly designed caps infrastructure.
* The intention is to add path restriction lists and similar features to improve jailess security in the near future, and to optimize the priv_check code.
show more ...
|
#
d8215152 |
| 05-Apr-2023 |
Sascha Wildner <saw@online.de> |
kernel/machdep: Add missing opt_maxmem.h include to get at MAXMEM.
|
#
a653ba3a |
| 05-Apr-2023 |
Sascha Wildner <saw@online.de> |
kernel/platform: Remove useless commented out includes of use_npx.h.
|
Revision tags: v6.4.0, v6.4.0rc1, v6.5.0, v6.2.2, v6.2.1, v6.2.0, v6.3.0, v6.0.1 |
|
#
31815141 |
| 01-Jul-2021 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Expand GDT table to maximum size
* Expand the GDT table from 9 entries to 65536 entries (limit field 0xFFFF).
* This deals with an Intel quirk in VMX where the descriptor for the GDT l
kernel - Expand GDT table to maximum size
* Expand the GDT table from 9 entries to 65536 entries (limit field 0xFFFF).
* This deals with an Intel quirk in VMX where the descriptor for the GDT limit field is not restored on a VM exit, but instead unconditionally set to 0xFFFF.
show more ...
|
#
c713db65 |
| 24-May-2021 |
Aaron LI <aly@aaronly.me> |
vm: Change 'kernel_pmap' global to pointer type
Following the previous commits, this commit changes the 'kernel_pmap' to pointer type of 'struct pmap *'. This makes it align better with 'kernel_map
vm: Change 'kernel_pmap' global to pointer type
Following the previous commits, this commit changes the 'kernel_pmap' to pointer type of 'struct pmap *'. This makes it align better with 'kernel_map' and simplifies the code a bit.
No functional changes.
show more ...
|
#
5936d3e8 |
| 20-May-2021 |
Aaron LI <aly@aaronly.me> |
vm: Change {buffer,clean,pager}_map globals to pointer type
Similar to the previous commit that changes global 'kernel_map' to type of 'struct vm_map *', change related globals 'buffer_map', 'clean_
vm: Change {buffer,clean,pager}_map globals to pointer type
Similar to the previous commit that changes global 'kernel_map' to type of 'struct vm_map *', change related globals 'buffer_map', 'clean_map' and 'pager_map' to pointer type, i.e., 'struct vm_map *'.
No functional changes.
show more ...
|
#
1eeaf6b2 |
| 20-May-2021 |
Aaron LI <aly@aaronly.me> |
vm: Change 'kernel_map' global to type of 'struct vm_map *'
Change the global variable 'kernel_map' from type 'struct vm_map' to a pointer to this struct. This simplify the code a bit since all inv
vm: Change 'kernel_map' global to type of 'struct vm_map *'
Change the global variable 'kernel_map' from type 'struct vm_map' to a pointer to this struct. This simplify the code a bit since all invocations take its address. This change also aligns with NetBSD's 'kernal_map' that it's also a pointer, which also helps the porting of NVMM.
No functional changes.
show more ...
|
Revision tags: v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2 |
|
#
a4e142f0 |
| 29-Aug-2020 |
Aaron LI <aly@aaronly.me> |
x86_64/machdep.c: Fix two minor typos and indentation
|
#
f0ee3437 |
| 29-Aug-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix KVM implosion by enabling the IOAPIC
* We were disabling the IOAPIC universally when running under a VM, because older VMs sometimes broke when it was enabled.
However, this can ac
kernel - Fix KVM implosion by enabling the IOAPIC
* We were disabling the IOAPIC universally when running under a VM, because older VMs sometimes broke when it was enabled.
However, this can actually implode the virtual machine by causing interrupt routing to go haywire.
Add a case statement and always enable the IOAPIC on bare hardware and KVM guests.
* Change the default to also enable the IOAPIC on all other VMs. Cases can be added for specific disablement if necessary.
* Fixes Google Cloud Environment booting.
show more ...
|
#
9d724079 |
| 26-Aug-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Document confusing physmap[1] initialization
* Document confusing physmap[1] initialization
|
#
b1122e3c |
| 19-Aug-2020 |
Michael Neumann <mneumann@ntecs.de> |
kernel - Remove machdep.hack_efifb_probe_early hack
* This hack was introduced as a temporary fix for bug #3167 (see commit c2a57f42).
* Since commit faeb2db "kernel - Hack the DMAP size" the tempo
kernel - Remove machdep.hack_efifb_probe_early hack
* This hack was introduced as a temporary fix for bug #3167 (see commit c2a57f42).
* Since commit faeb2db "kernel - Hack the DMAP size" the temporary hack is no longer required.
* Tested on TUXEDO InfinityBook Pro 14v4 laptop where this bug initially occured.
show more ...
|
#
ebc415a3 |
| 29-Jul-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Cleanup recent DMAP hacks for EFI framebuffers
* Adjust the EFI framebuffer code to test against DMapMaxAddress instead of DMAP_MAX_ADDRESS.
* Remove the temporary hack that burned memor
kernel - Cleanup recent DMAP hacks for EFI framebuffers
* Adjust the EFI framebuffer code to test against DMapMaxAddress instead of DMAP_MAX_ADDRESS.
* Remove the temporary hack that burned memory with a too-large minimum DMAP size.
show more ...
|
#
add2647e |
| 28-Jul-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Improve debugging of spurious interrupts
* Report spurious T_RESERVED interrupt vectors / trap numbers. Report the actual trap number and try to ignore it. If this occurs it helps wit
kernel - Improve debugging of spurious interrupts
* Report spurious T_RESERVED interrupt vectors / trap numbers. Report the actual trap number and try to ignore it. If this occurs it helps with debugging as a cold boot 'vmstat -i -v' can be matched up against the spurious interrupt number (usually spurious# - 0x20) to locate the PCI device causing the problem.
show more ...
|
#
80d831e1 |
| 25-Jul-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor in-kernel system call API to remove bcopy()
* Change the in-kernel system call prototype to take the system call arguments as a separate pointer, and make the contents read-onl
kernel - Refactor in-kernel system call API to remove bcopy()
* Change the in-kernel system call prototype to take the system call arguments as a separate pointer, and make the contents read-only.
int sy_call_t (void *); int sy_call_t (struct sysmsg *sysmsg, const void *);
* System calls with 6 arguments or less no longer need to copy the arguments from the trapframe to a holding structure. Instead, we simply point into the trapframe.
The L1 cache footprint will be a bit smaller, but in simple tests the results are not noticably faster... maybe 1ns or so (roughly 1%).
show more ...
|
Revision tags: v5.8.1 |
|
#
f354e0e6 |
| 28-Mar-2020 |
Sascha Wildner <saw@online.de> |
kernel: Remove <sys/mutex.h> from all files that don't need it.
98% of these were remains from porting from FreeBSD which could have been removed after converting to lockmgr(), etc.
While here, do
kernel: Remove <sys/mutex.h> from all files that don't need it.
98% of these were remains from porting from FreeBSD which could have been removed after converting to lockmgr(), etc.
While here, do the same for <sys/mutex2.h>.
show more ...
|
#
63823918 |
| 11-Mar-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Allow 8254 timer to be forced, clean-up user/sys/intr/idle
* Allows the 8254 timer to be forced on for machines which do not support the LAPIC timer during deep-sleep. Fix an assertion t
kernel - Allow 8254 timer to be forced, clean-up user/sys/intr/idle
* Allows the 8254 timer to be forced on for machines which do not support the LAPIC timer during deep-sleep. Fix an assertion that occurs in this situation.
hw.i8254.intr_disable="0"
* Adjust the statclock to calculate user/sys/intr/idle time properly when the clock interrupt occurs from an interrupt thread instead of from a hard interrupt.
Basically when the clock interrupt occurs from an interrupt thread, we have to look at curthread->td_preempted instead of curthread.
In addition RQF_INTPEND will be set across the call due to the way processing works and we have to look at the bitmask of interrupt sources instead of this bit.
Reported-by: CuteLarva
show more ...
|
Revision tags: v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3, v5.6.2, v5.6.1, v5.6.0 |
|
#
921ef7b6 |
| 15-Jun-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix SMAP/SMEP caught user mode access part 2/2.
* Finish implementing SMAP exception handling support by properly detecting it in trap() and generating a panic(). Otherwise the cpu just
kernel - Fix SMAP/SMEP caught user mode access part 2/2.
* Finish implementing SMAP exception handling support by properly detecting it in trap() and generating a panic(). Otherwise the cpu just locks up in a page-fault loop without any indication as to why on the console.
* To properly support SMAP, make sure AC is cleared on system calls (it is already cleared on any interrupt or exception by the frame push code but I missed the syscall entry code).
show more ...
|
#
588042b5 |
| 12-Jun-2019 |
Sascha Wildner <saw@online.de> |
<sys/signal.h>: Adjust the type of stack_t's ss_sp from char * to void *.
|
Revision tags: v5.6.0rc1, v5.7.0 |
|
#
c9678a7e |
| 22-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Reduce/refactor nbuf and maxvnodes calculations.
* The prime motivation for this commit is to target about 1/20 (5%) of physical memory for use by the kernel. These changes significant
kernel - Reduce/refactor nbuf and maxvnodes calculations.
* The prime motivation for this commit is to target about 1/20 (5%) of physical memory for use by the kernel. These changes significantly reduce kernel memory usage on systems with less than 4GB of ram (and more specific for systems with less than 1TB of ram), and also emplace more reasonable caps on systems with 128GB+ of ram.
These changes return 100-200MB of ram to userland on systems with 1GB of ram, and return around 6.5GB of ram on systems with 128G of ram.
* The nbuf calculation and related code documentation was a bit crufty, still somewhat designed for an earlier era and was calculating about twice the stated 5% target. For systems with 128GB of ram or less the calculation was simply creating too many filesystem buffers, allowing as much as 10% of physical memory to be locked up by the buffer cache.
Particularly on small systems, this 10% plus other kernel overheads left a lot less memory available for user programs than we would have liked. This work gets us closer to the 5% target.
* Change the base calculation from 1/10 of physical memory to 1/20 of physical memory, cutting the number of buffers in half on most systems. The code documentation stated 1/20 but was actually calculating 1/10.
* On large memory systems > 100GB the number of buffers is now capped at around 400000 or so (allowing the buffer cache to use around 6.5 GBytes). This cap was previously based on a relatively disconnected parameter relating to available memory in early boot, and when triggered it actually miscalculating nbufs to be double the intended number.
The new cap is based on a fixed maximum of 500MB worth of struct bufs, roughly similar to the original intention. This change reduces the number of buffers reserved on system with more than around 100GB of ram from around 12GB worth of data down to 6.5GB.
* With the BKVABIO work eliminating most SMP invltlbs on buffer recyclement, there is no real reason to need a huge buffer cache. Just make sure its big enough on large-memory machines to fully cache the likely live datasets for things like bulk compiles and such.
* For kern.maxvnodes (which can be changed at run-time if you desire), the base calcualtion on systems with less than 1GB of ram has been cut in half (~60K vnodes to ~30K vnodes). It will ramp up more slowly until it roughly matches the prior calculation at 4GB of system memory. On systems with enough memory, maxvnodes is now explicitly capped at 4M.
There generally is no need to allow an excessive number of vnodes to be cached.
For HAMMER1 you can set vfs.hammer.double_buffer=1 to cause it to cache data from the underlying device, allowing it to utilize all available free(ish) memory regardless of the maxvnodes setting.
HAMMER2 caches disk blocks in the underlying device by default. The vnode-based vm_object caches decompressed data, so we want to have enough vnodes for nominal heavily parallel bulk operations to avoid unnecessary re-lookups of the vnode as well as avoid having to decompress the same thing over and over again.
In both cases an excessively high kern.maxvnodes actually wastes memory on both HAMMER1 and HAMMER2... or at least makes the pageout daemon's job more difficult.
* Remove vfs.maxmallocbufspace. It is no longer connected to anything.
show more ...
|
#
2a7bd4d8 |
| 18-May-2019 |
Sascha Wildner <saw@online.de> |
kernel: Don't include <sys/user.h> in kernel code.
There is really no point in doing that because its main purpose is to expose kernel structures to userland. The majority of cases wasn't needed at
kernel: Don't include <sys/user.h> in kernel code.
There is really no point in doing that because its main purpose is to expose kernel structures to userland. The majority of cases wasn't needed at all and the rest required only a couple of other includes.
show more ...
|
Revision tags: v5.4.3 |
|
#
cd9c4877 |
| 17-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Implement support for SMAP and SMEP security (3)
* Issue clac after the push on all traps, interrupts, and exceptions.
* Improve code documentation.
|
#
48c77f2b |
| 17-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Implement support for SMAP and SMEP security
* Implement support for SMAP security. This prevents accidental accesses to user address space from the kernel. When available, we wrap in
kernel - Implement support for SMAP and SMEP security
* Implement support for SMAP security. This prevents accidental accesses to user address space from the kernel. When available, we wrap intentional user-space accesses from the kernel with the 'stac' and 'clac' instructions.
We use a NOP replacement policy to implement the feature. The wrapper is initially a 'nop %eax' (3-byte NOP), and is replaced by 'stac' and 'clac' via a .section iteration when the feature is supported.
* Implement support for SMEP security. This prevents accidental execution of user code from the kernel and simply requires turning the bit on in CR4.
* Reports support in dmesg via the 'CPU Special Features Installed:' line.
show more ...
|
Revision tags: v5.4.2 |
|
#
3407ac90 |
| 24-Mar-2019 |
Sascha Wildner <saw@online.de> |
kernel: Fix indent.
|
#
c2a57f42 |
| 09-Jan-2019 |
Michael Neumann <mneumann@ntecs.de> |
Add work-around for bug #3167
"UEFI boot hangs right after initializing UEFI framebuffer." It actually boots but the system console is not shown.
I had this issue on a TUXEDO InfinityBook Pro 14v4.
Add work-around for bug #3167
"UEFI boot hangs right after initializing UEFI framebuffer." It actually boots but the system console is not shown.
I had this issue on a TUXEDO InfinityBook Pro 14v4. This commits allows me to boot by setting loader tunable machdep.hack_efifb_probe_early=1.
This commit is not intended to be there forever. It's there for people who experience the same issue and want a quick and easy way to test if this fixes their booting issue.
Discussed-with: dillon
show more ...
|
#
831b6312 |
| 05-Jan-2019 |
Sascha Wildner <saw@online.de> |
kernel: Remove kernel profiling bits.
It was broken on i386, is even more broken on x86_64 and isn't worth fixing.
Discussed-with: dillon
|