Revision tags: v6.2.1, v6.2.0, v6.3.0, v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2 |
|
#
add2647e |
| 28-Jul-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Improve debugging of spurious interrupts
* Report spurious T_RESERVED interrupt vectors / trap numbers. Report the actual trap number and try to ignore it. If this occurs it helps wit
kernel - Improve debugging of spurious interrupts
* Report spurious T_RESERVED interrupt vectors / trap numbers. Report the actual trap number and try to ignore it. If this occurs it helps with debugging as a cold boot 'vmstat -i -v' can be matched up against the spurious interrupt number (usually spurious# - 0x20) to locate the PCI device causing the problem.
show more ...
|
Revision tags: v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3, v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3 |
|
#
3cc72d3d |
| 17-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
Revert "kernel - Clean up direction flag on syscall entry"
Actually not needed, the D flag is cleared via the mask set in MSR_SF_MASK. Revert.
This reverts commit cea0e49dc0b2e5aea1b929d02f12d00df
Revert "kernel - Clean up direction flag on syscall entry"
Actually not needed, the D flag is cleared via the mask set in MSR_SF_MASK. Revert.
This reverts commit cea0e49dc0b2e5aea1b929d02f12d00df66528e2.
show more ...
|
#
cea0e49d |
| 17-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Clean up direction flag on syscall entry
* Make sure the direction flag is clear on syscall entry. Don't trust userland.
|
Revision tags: v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1 |
|
#
9ee3b786 |
| 21-Sep-2018 |
Sascha Wildner <saw@online.de> |
kernel: Remove some obsolete commented out code.
|
Revision tags: v5.2.2 |
|
#
9474cbef |
| 11-Jun-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
Kernel - Additional cpu bug hardening part 2/2
* Due to speculative instruction execution, the kernel may speculatively execute instructions using data from registers that still contain userland
Kernel - Additional cpu bug hardening part 2/2
* Due to speculative instruction execution, the kernel may speculatively execute instructions using data from registers that still contain userland-controlled content.
Reduce the chance of this situation arising by proactively clearing all user registers after saving them for syscalls, exceptions, and interrupts. In addition, for system calls, zero-out any unrestored registers on-return to avoid leaking kernel data back to userland.
* This was discussed over the last few months in various OS groups and I've decided to implement it. After the FP debacle, it is prudent to also give general registers similar protections.
show more ...
|
Revision tags: v5.2.1 |
|
#
85b33048 |
| 01-May-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix CVE-2018-8897, debug register issue
* #DB can be delayed in a way that causes it to occur on the first instruction of the int $3 or syscall handlers. These handlers must be able to
kernel - Fix CVE-2018-8897, debug register issue
* #DB can be delayed in a way that causes it to occur on the first instruction of the int $3 or syscall handlers. These handlers must be able to detect and handle the condition. This is a historical artifact of cpu operation that has existed for a very long time on both AMD and Intel CPUs.
* Fix by giving #DB its own trampoline stack and a way to load a deterministic %gs and %cr3 independent of the normal CS check. This is CVE-2018-8897.
* Also fix the NMI trampoline while I'm here.
* Also fix an old issue with debug register trace traps which can occur when the kernel is accessing the user's address space. This fix was lost years ago, now recovered.
Credits: Nick Peterson of Everdox Tech, LLC (original reporter) Credits: Thanks to Microsoft for coordinating the OS vendor response
show more ...
|
Revision tags: v5.2.0, v5.3.0, v5.2.0rc |
|
#
26c7e964 |
| 11-Jan-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Implement spectre mitigations part 3 (stabilization)
* Fix a bug in the system call entry code. The wrong stack pointer was being loaded for KMMUENTRY_SYSCALL and KMMUENTRY_SYSCALL was
kernel - Implement spectre mitigations part 3 (stabilization)
* Fix a bug in the system call entry code. The wrong stack pointer was being loaded for KMMUENTRY_SYSCALL and KMMUENTRY_SYSCALL was using an offset that did not exist in certain situations.
* Load the correct stack pointer, but also change KMMUENTRY_CORE to not use stack-relative loads and stores. Instead it uses the trampframe directly via %gs:BLAH
Reported-by: zrj
show more ...
|
#
9283c84b |
| 10-Jan-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Implement spectre mitigations part 2
* NOTE: The last few commits may have said 'IBPB' but they really meant 'IBRS. The last few commits addde IBRS support, this one cleans that up and
kernel - Implement spectre mitigations part 2
* NOTE: The last few commits may have said 'IBPB' but they really meant 'IBRS. The last few commits addde IBRS support, this one cleans that up and adds IBPB support.
* Intel says for IBRS always-on mode (mode 2), SPEC_CTRL still has to be poked on every user->kernel entry as a barrier, even though the value is not being changed. So make this change. This actually somewhat improves performance a little on Skylake and later verses when I just set it globally and left it that way.
* Implement IBPB detection and support on Intel. At the moment we default to turning it off because the performance hit is pretty massive. Currently the work on linux is only using IBPB for VMM related operations and not for user->kernel entry.
* Enhance the machdep.spectre_mitigation sysctl to print out what the mode matrix is whenever you change it, in human readable terms.
0 IBRS disabled IBPB disabled 1 IBRS mode 1 (kernel-only) IBPB disabled 2 IBRS mode 2 (at all times) IBPB disabled
4 IBRS disabled IBPB enabled 5 IBRS mode 1 (kernel-only) IBPB enabled 6 IBRS mode 2 (at all times) IBPB enabled
Currently we default to (1) instead of (5) when we detect that the microcode detects both features. IBPB is not turned on by default (you can see why below).
* Haswell and Skylake performance loss matrix using the following test. This tests a high-concurrency compile, which is approximately a 5:1 user:kernel test with high concurrency.
The haswell box is: i3-4130 CPU @ 3.40GHz (2-core/4-thread) The skylake box is: i5-6500 CPU @ 3.20GHz (4-core/4-thread)
This does not include MMU isolation losses, which will add another 3-4% or so in losses.
(/usr/obj on tmpfs) time make -j 8 nativekernel NO_MODULES=TRUE
PERFORMANCE LOSS MATRIX HASWELL SKYLAKE IBPB=0 IBPB=1 IBPB=0 IBPB=1 IBRS=0 0% 12% 0% 17% IBRS=1 >12%< 21% >2.4%< 15% IBRS=2 58% 60% 23% 32%
Note that the automatic default when microcode support is detected is IBRS=1, IBPB=0 (12% loss on Haswell and 2.4% loss on Skylake for this test). If we add 3-4% or so for MMU isolation, a Haswell cpu loses around 16% and a Skylake cpu loses around 6% or so in performance.
PERFORMANCE LOSS MATRIX (including 3% MMU isolation losses) HASWELL SKYLAKE IBPB=0 IBPB=1 IBPB=0 IBPB=1 IBRS=0 3% 15% 3% 20% IBRS=1 >15%< 24% >5.4%< 18% IBRS=2 61% 63% 26% 35%
show more ...
|
#
8ed06571 |
| 10-Jan-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Implement spectre mitigations part 1
* Implement machdep.spectre_mitigation. This can be set as a tunable or sysctl'd later. The tunable is only applicable if the BIOS has the appropr
kernel - Implement spectre mitigations part 1
* Implement machdep.spectre_mitigation. This can be set as a tunable or sysctl'd later. The tunable is only applicable if the BIOS has the appropriate microcode, otherwise you have to update the microcode first and then use sysctl to set the mode.
This works similarly to Linux's IBRS support.
mode 0 - Spectre IBPB MSRs disabled
mode 1 - Sets IBPB MSR on USER->KERN transition and clear it on KERN->USER.
mode 2 - Leave IBPB set globally. Do not toggle on USER->KERN or KERN->USER transitions.
* Retest spectre microcode MSRs on microcode update.
* Spectre mode 1 is enabled by default if the microcode supports it. (we might change this to disabled by default, I'm still mulling it over).
* General performance effects (not counting the MMU separation mode, which is machdep.meltdown_mitigation and adds another 3% in overhead):
Skylake loses around 5% for mode 1 and 12% for mode 2, verses mode 0. Haswell loses around 12% for mode 1 and 53% for mode 2, verses mode 0.
Add another 3% if MMU separation is also turned on (aka machdep.meltdown_mitigation).
* General system call overhead effects on Skylake:
machdep.meltdown_mitigation=0, machdep.spectre_mitigation=0 103ns machdep.meltdown_mitigation=1, machdep.spectre_mitigation=0 360ns machdep.meltdown_mitigation=1, machdep.spectre_mitigation=1 848ns machdep.meltdown_mitigation=1, machdep.spectre_mitigation=2 404ns
Note that mode 1 has better overall performance for mixed user+kernel workloads despite having a much higher system call overhead, whereas mode 2 has lower system call overhead but generally lower overall performance because IBPB is enabled in usermode.
show more ...
|
#
6a65d560 |
| 06-Jan-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Intel user/kernel separation MMU bug fix part 5
* Fix iretq fault handling. As I thought, I messed it up with the trampoline patches. Fixing it involves issuing the correct KMMU* macr
kernel - Intel user/kernel separation MMU bug fix part 5
* Fix iretq fault handling. As I thought, I messed it up with the trampoline patches. Fixing it involves issuing the correct KMMU* macros to ensure that the code is on the correct stack and has the correct mmu context.
Revalidate with a test program that uses a signal handler to change the stack segment descriptor to something it shouldn't be.
* Get rid of the "kernel trap 9..." console message for the iretq fault case.
show more ...
|
#
9e24b495 |
| 05-Jan-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Intel user/kernel separation MMU bug fix part 3/3
* Implement the isolated pmap template, iso_pmap. The pmap code will generate a dummy iso_pmap containing only the kernel mappings requi
kernel - Intel user/kernel separation MMU bug fix part 3/3
* Implement the isolated pmap template, iso_pmap. The pmap code will generate a dummy iso_pmap containing only the kernel mappings required for userland to be able to transition into the kernel and vise-versa.
The mappings needed are:
(1) The per-cpu trampoline area for our stack (rsp0) (2) The global descriptor table (gdt) for all cpus (3) The interrupt descriptor table (idt) for all cpus (4) The TSS block for all cpus (we store this in the trampoline page) (5) Kernel code addresses for the interrupt vector entry and exit
* In this implementation the 'kernel code' addresses are currently just btext to etext. That is, the kernel's primary text area. Kernel data and bss are not part of the isolation map.
TODO - just put the vector entry and exit points in the map, and not the entire kernel.
* System call performance is reduced when isolation is turned on. 100ns -> 350ns or so. However, typical workloads should not lose more than 5% performance or so. System-call heavy and interrupt-heavy workloads (network, database, high-speed storage, etc) can lose a lot more performance.
We leave the trampoline code in-place whether isolation is turned on or not. The trampoline overhead, without isolation, is only 5nS or so.
* Fix a missing exec-related trampoline initialization.
* Clean-up kernel page table PTEs a bit. PG_M is ignored on non-terminal PTEs, so don't set it. Also don't set PG_U in non-terminal kernel page table pages (PG_U is never set on terminal PTEs so this wasn't a problem, but we should be correct).
* Fix a bug in fast_syscall's trampoline stack. The wrong stack pointer was being loaded.
* Move mdglobaldata->gd_common_tss to privatespace->common_tss. Place common_tss in the same page as the trampoline to reduce exposure to globaldata from the isolated MMU context.
* 16-byte align struct trampframe for convenience.
* Fix a bug in POP_FRAME. Always cli in order to avoid getting an interrupt just at the iretq instruction, which might be misinterpreted.
show more ...
|
#
fc921477 |
| 04-Jan-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Intel user/kernel separation MMU bug fix part 2/3
* Cleanup pass. Throw in some documentation.
* Move the gd_pcb_* fields into the trampoline page to allow kernel memory to be further r
kernel - Intel user/kernel separation MMU bug fix part 2/3
* Cleanup pass. Throw in some documentation.
* Move the gd_pcb_* fields into the trampoline page to allow kernel memory to be further restricted in part 3.
show more ...
|
#
4611d87f |
| 03-Jan-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Intel user/kernel separation MMU bug fix part 1/3
* Part 1/3 of the fix for the Intel user/kernel separation MMU bug. It appears that it is possible to discern the contents of kernel me
kernel - Intel user/kernel separation MMU bug fix part 1/3
* Part 1/3 of the fix for the Intel user/kernel separation MMU bug. It appears that it is possible to discern the contents of kernel memory with careful timing measurements of instructions due to speculative memory reads and speculative instruction execution by Intel cpus. This can happen because Intel will allow both to occur even when the memory access is later disallowed due to privilege separation in the PTE.
Even though the execution is always aborted, the speculative reads and speculative execution results in timing artifacts which can be measured. A speculative compare/branch can lead to timing artifacts that allow the actual contents of kernel memory to be discerned.
While there are multiple speculative attacks possible, the Intel bug is particularly bad because it allows a user program to more or less effortlessly access kernel memory (and if a DMAP is present, all of physical memory).
* Part 1 implements all the logic required to load an 'isolated' version of the user process's PML4e into %cr3 on all user transitions, and to load the 'normal' U+K version into %cr3 on all transitions from user to kernel.
* Part 1 fully allocates, copies, and implements the %cr3 loads for the 'isolated' version of the user process PML4e.
* Part 1 does not yet actually adjust the contents of this isolated version to replace the kernel map with just a trampoline map in kernel space. It does remove the DMAP as a test, though. The full separation will be done in part 3.
show more ...
|
#
e1caeca9 |
| 19-Dec-2017 |
zrj <rimvydas.jasinskas@gmail.com> |
i386 removal, part 63/x: Remove some leftovers in segments.h
Last users were removed in 8c2a9b77413a2154aa54084381f6703dd5446fa4
|
Revision tags: v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1 |
|
#
87ef2da6 |
| 23-Jul-2017 |
zrj <rimvydas.jasinskas@gmail.com> |
sys: Some whitespace cleanup.
While there, fix indentation and few typos a bit. No functional change.
|
Revision tags: v4.8.0, v4.6.2, v4.9.0, v4.8.0rc |
|
#
d6e8ab2d |
| 18-Oct-2016 |
Sascha Wildner <saw@online.de> |
kernel: Remove the COMPAT_43 kernel option along with all related code.
It is commented out in our default kernel config files for almost five years now, since 9466f37df5258f3bc3d99ae43627a71c1c085e
kernel: Remove the COMPAT_43 kernel option along with all related code.
It is commented out in our default kernel config files for almost five years now, since 9466f37df5258f3bc3d99ae43627a71c1c085e7d.
Approved-by: dillon Dragonfly-bug: <https://bugs.dragonflybsd.org/issues/2946>
show more ...
|
Revision tags: v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0, v4.4.3, v4.4.2 |
|
#
c66c7e2f |
| 25-Jan-2016 |
zrj <rimvydas.jasinskas@gmail.com> |
Correct BSD License clause numbering from 1-2-4 to 1-2-3.
|
Revision tags: v4.4.1, v4.4.0, v4.5.0, v4.4.0rc, v4.2.4, v4.3.1, v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc, v4.0.5, v4.0.4, v4.0.3 |
|
#
32d3bd25 |
| 06-Jan-2015 |
Sascha Wildner <saw@online.de> |
kernel/pc64: Change all the remaining #if JG's to #if 0 (fixing -Wundef).
|
Revision tags: v4.0.2, v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0, v3.8.2, v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc, v3.6.2, v3.6.1, v3.6.0, v3.7.1, v3.6.0rc, v3.7.0, v3.4.3, v3.4.2, v3.4.0, v3.4.1, v3.4.0rc, v3.5.0, v3.2.2, v3.2.1, v3.2.0, v3.3.0, v3.0.3, v3.0.2, v3.0.1, v3.1.0, v3.0.0 |
|
#
3338cc67 |
| 12-Dec-2011 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Misc fixes and debugging
* Add required CLDs in the exception paths. The interrupt paths already CLD in PUSH_FRAME.
* Note that the fast_syscall (SYSENTER) path has an implied CLD due t
kernel - Misc fixes and debugging
* Add required CLDs in the exception paths. The interrupt paths already CLD in PUSH_FRAME.
* Note that the fast_syscall (SYSENTER) path has an implied CLD due to the hardware mask applied to rflags.
* Add the IOPL bits to the set of bits set to 0 during a fast_syscall.
* When creating a dummy interrupt frame we don't have to push the actual %cs. Just push $0 so the frame isn't misinterpreted as coming from userland.
* Additional debug verbosity for freeze_on_seg_fault.
* Reserve two void * fields for LWP debugging (for a later commit)
show more ...
|
#
86d7f5d3 |
| 26-Nov-2011 |
John Marino <draco@marino.st> |
Initial import of binutils 2.22 on the new vendor branch
Future versions of binutils will also reside on this branch rather than continuing to create new binutils branches for each new version.
|
#
f2081646 |
| 12-Nov-2011 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add syscall quick return path for x86-64
* Flag the case where a sysretq can be performed to quickly return from a system call instead of having to execute the slower doreti code.
* Th
kernel - Add syscall quick return path for x86-64
* Flag the case where a sysretq can be performed to quickly return from a system call instead of having to execute the slower doreti code.
* This about halves syscall times for simple system calls such as getuid(), and reduces longer syscalls by ~80ns or so on a fast 3.4GHz SandyBridge, but does not seem to really effect performance a whole lot.
Taken-From: FreeBSD (loosely)
show more ...
|
Revision tags: v2.12.0, v2.13.0, v2.10.1, v2.11.0, v2.10.0, v2.9.1, v2.8.2, v2.8.1, v2.8.0, v2.9.0, v2.6.3, v2.7.3, v2.6.2, v2.7.2, v2.7.1, v2.6.1, v2.7.0, v2.6.0 |
|
#
b2b3ffcd |
| 04-Nov-2009 |
Simon Schubert <corecode@dragonflybsd.org> |
rename amd64 architecture to x86_64
The rest of the world seems to call amd64 x86_64. Bite the bullet and rename all of the architecture files and references. This will hopefully make pkgsrc build
rename amd64 architecture to x86_64
The rest of the world seems to call amd64 x86_64. Bite the bullet and rename all of the architecture files and references. This will hopefully make pkgsrc builds less painful.
Discussed-with: dillon@
show more ...
|
#
c1543a89 |
| 04-Nov-2009 |
Simon Schubert <corecode@dragonflybsd.org> |
rename amd64 architecture to x86_64
The rest of the world seems to call amd64 x86_64. Bite the bullet and rename all of the architecture files and references. This will hopefully make pkgsrc build
rename amd64 architecture to x86_64
The rest of the world seems to call amd64 x86_64. Bite the bullet and rename all of the architecture files and references. This will hopefully make pkgsrc builds less painful.
Discussed-with: dillon@
show more ...
|
#
cc9b6223 |
| 24-Mar-2011 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Revamp LWKT thread migration
* Rearrange the handling of TDF_RUNNING, making lwkt_switch() responsible for it instead of the assembly switch code. Adjust td->td_switch() to return the
kernel - Revamp LWKT thread migration
* Rearrange the handling of TDF_RUNNING, making lwkt_switch() responsible for it instead of the assembly switch code. Adjust td->td_switch() to return the previously running thread.
This allows lwkt_switch() to process thread migration between cpus after the thread has been completely and utterly switched out, removing the need to loop on TDF_RUNNING on the target cpu.
* Fixes lwkt_setcpu_remote livelock failure
* This required major surgery on the core thread switch assembly, testing is needed. I tried to avoid doing this but the livelock problems persisted, so the only solution was to remove the need for the loops that were causing the livelocks.
* NOTE: The user process scheduler is still using the old giveaway/acquire method. More work is needed here.
Reported-by: "Magliano Andre'" <masterblaster@tiscali.it>
show more ...
|
#
bd52bedf |
| 31-Jan-2011 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel64 - Fix disabled interrupts during dbg/bpt trap
* Interrupts were left improperly disabled during a dbg or bpt trap. i386 enables interrupts for these traps. x86-64 needs to as well or i
kernel64 - Fix disabled interrupts during dbg/bpt trap
* Interrupts were left improperly disabled during a dbg or bpt trap. i386 enables interrupts for these traps. x86-64 needs to as well or it will hit an assertion in lwkt_switch() under certain circumstances.
* Make debug code in lwkt_switch() also require INVARIANTS to function.
NOTE: This is temporary debug code and should be removed at some point after 48-core testing is complete.
show more ...
|