#
a653ba3a |
| 05-Apr-2023 |
Sascha Wildner <saw@online.de> |
kernel/platform: Remove useless commented out includes of use_npx.h.
|
Revision tags: v6.4.0, v6.4.0rc1, v6.5.0, v6.2.2, v6.2.1, v6.2.0, v6.3.0, v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3, v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3 |
|
#
00780082 |
| 03-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Permanently fix FP bug - completely remove lazy heuristic
* Remove the FP lazy heuristic. When the FP unit is being used by a thread, it will now *always* be actively saved and restored
kernel - Permanently fix FP bug - completely remove lazy heuristic
* Remove the FP lazy heuristic. When the FP unit is being used by a thread, it will now *always* be actively saved and restored on context switch.
This means that if a process uses the FP unit at all, its context switches (to another thread) will active save/restore the state forever more.
* This fixes a known hardware bug on Intel CPUs that we thought was fixed before (by not saving The FP context from thread A from the DNA interrupt on thread B)... but it turns out it wasn't.
We could tickle the bug on Intel CPUs by forcing synth to regenerate its flavor index over and over again. This regeneration fork/exec's about 60,000 make's, sequencing concurrently on all cores, and usually hits the bug in less than 5 minutes.
* We no longer support lazy FP restores, period. This is like the fourth time I've tried to deal with this, so now its time to give up and not use lazy restoration at all, ever again.
show more ...
|
Revision tags: v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2 |
|
#
e5aace14 |
| 11-Jun-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
Kernel - Additional cpu bug hardening part 1/2
* OpenBSD recently made a commit that scraps the use of delayed FP state saving due to a rumor that the content of FP registers owned by another pr
Kernel - Additional cpu bug hardening part 1/2
* OpenBSD recently made a commit that scraps the use of delayed FP state saving due to a rumor that the content of FP registers owned by another process can be speculatively detected when they are present for the current process, even when the TS bit is used to force a DNA trap.
This rumor has been circulating for a while. OpenBSD felt that the lack of responsiveness from Intel forced their hand. Since they've gone ahead and pushed a fix for this potential problem, we are going to as well.
* DragonFlyBSD already synchronously saves FP state on switch-out. However, it only cleans the state up afterwords by calling fninit and this isn't enough to actually erase the content in the %xmm registers. We want to continue to use delayed FP state restores because it saves a considerable amount of switching time when we do not have to do a FP restore.
Most programs touch the FP registers at startup due to rtld linking, and more and more programs use the %xmm registers as general purpose registers. OpenBSD's solution of always proactively saving and restoring FP state is a reasonable one. DragonFlyBSD is going to take a slightly different tact in order to try to retain more optimal switching behavior when the FP unit is not in continuous use.
* Our first fix is to issue a FP restore on dummy state after our FP save to guarantee that all FP registers are zerod after FP state is saved. This allows us to continue to support delayed FP state loads with zero chance of leakage between processes.
* Our second fix is to add a heuristic which allows the kernel to detect when the FP unit is being used seriously (verses just during program startup).
We have added a sysctl machdep.npx_fpu_heuristic heuristic for this purpose which defaults to the value 32. Values can be:
0 - Proactive FPU state loading disabled (old behavior retained). Note that the first fix remains active, the FP register state is still cleared after saving so no leakage can occur. All processes will take a DNA trap after a thread switch when they access the FP state.
1 - Proactive FPU state loading is enabled at all times for a thread after the first FP access made by that thread. Upon returning from a thread switch, the FPU state will already be ready to go and a DNA trap will not occur.
N - Proactive FPU state loading enabled for N context switches, then disabled. The next DNA fault after disablement then re-enables proactive loading for the next N context switches.
Default value is 32. This ensures that program startup can use the FP unit but integer-centric programs which don't afterwords will not incur the FP switching overhead (for either switch-away or switch-back).
show more ...
|
Revision tags: v5.2.1, v5.2.0, v5.3.0, v5.2.0rc |
|
#
fc921477 |
| 04-Jan-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Intel user/kernel separation MMU bug fix part 2/3
* Cleanup pass. Throw in some documentation.
* Move the gd_pcb_* fields into the trampoline page to allow kernel memory to be further r
kernel - Intel user/kernel separation MMU bug fix part 2/3
* Cleanup pass. Throw in some documentation.
* Move the gd_pcb_* fields into the trampoline page to allow kernel memory to be further restricted in part 3.
show more ...
|
#
4611d87f |
| 03-Jan-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Intel user/kernel separation MMU bug fix part 1/3
* Part 1/3 of the fix for the Intel user/kernel separation MMU bug. It appears that it is possible to discern the contents of kernel me
kernel - Intel user/kernel separation MMU bug fix part 1/3
* Part 1/3 of the fix for the Intel user/kernel separation MMU bug. It appears that it is possible to discern the contents of kernel memory with careful timing measurements of instructions due to speculative memory reads and speculative instruction execution by Intel cpus. This can happen because Intel will allow both to occur even when the memory access is later disallowed due to privilege separation in the PTE.
Even though the execution is always aborted, the speculative reads and speculative execution results in timing artifacts which can be measured. A speculative compare/branch can lead to timing artifacts that allow the actual contents of kernel memory to be discerned.
While there are multiple speculative attacks possible, the Intel bug is particularly bad because it allows a user program to more or less effortlessly access kernel memory (and if a DMAP is present, all of physical memory).
* Part 1 implements all the logic required to load an 'isolated' version of the user process's PML4e into %cr3 on all user transitions, and to load the 'normal' U+K version into %cr3 on all transitions from user to kernel.
* Part 1 fully allocates, copies, and implements the %cr3 loads for the 'isolated' version of the user process PML4e.
* Part 1 does not yet actually adjust the contents of this isolated version to replace the kernel map with just a trampoline map in kernel space. It does remove the DMAP as a test, though. The full separation will be done in part 3.
show more ...
|
#
466d4f43 |
| 19-Dec-2017 |
zrj <rimvydas.jasinskas@gmail.com> |
kernel/pc64: Adjust some references to already removed i386.
While there, perform some whitespace fixes. No functional change.
|
Revision tags: v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1 |
|
#
b1d2a2de |
| 23-Jul-2017 |
zrj <rimvydas.jasinskas@gmail.com> |
sys: Add size directives to assembly functions.
No functional change intended.
|
#
87ef2da6 |
| 23-Jul-2017 |
zrj <rimvydas.jasinskas@gmail.com> |
sys: Some whitespace cleanup.
While there, fix indentation and few typos a bit. No functional change.
|
Revision tags: v4.8.0, v4.6.2, v4.9.0, v4.8.0rc |
|
#
95270b7e |
| 01-Feb-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Many fixes for vkernel support, plus a few main kernel fixes
REAL KERNEL
* The big enchillada is that the main kernel's thread switch code has a small timing window where it clears t
kernel - Many fixes for vkernel support, plus a few main kernel fixes
REAL KERNEL
* The big enchillada is that the main kernel's thread switch code has a small timing window where it clears the PM_ACTIVE bit for the cpu while switching between two threads. However, it *ALSO* checks and avoids loading the %cr3 if the two threads have the same pmap.
This results in a situation where an invalidation on the pmap in another cpuc may not have visibility to the cpu doing the switch, and yet the cpu doing the switch also decides not to reload %cr3 and so does not invalidate the TLB either. The result is a stale TLB and bad things happen.
For now just unconditionally load %cr3 until I can come up with code to handle the case.
This bug is very difficult to reproduce on a normal system, it requires a multi-threaded program doing nasty things (munmap, etc) on one cpu while another thread is switching to a third thread on some other cpu.
* KNOTE after handling the vkernel trap in postsig() instead of before.
* Change the kernel's pmap_inval_smp() code to take a 64-bit npgs argument instead of a 32-bit npgs argument. This fixes situations that crop up when a process uses more than 16TB of address space.
* Add an lfence to the pmap invalidation code that I think might be needed.
* Handle some wrap/overflow cases in pmap_scan() related to the use of large address spaces.
* Fix an unnecessary invltlb in pmap_clearbit() for unmanaged PTEs.
* Test PG_RW after locking the pv_entry to handle potential races.
* Add bio_crc to struct bio. This field is only used for debugging for now but may come in useful later.
* Add some global debug variables in the pmap_inval_smp() and related paths. Refactor the npgs handling.
* Load the tsc_target field after waiting for completion of the previous invalidation op instead of before. Also add a conservative mfence() in the invalidation path before loading the info fields.
* Remove the global pmap_inval_bulk_count counter.
* Adjust swtch.s to always reload the user process %cr3, with an explanation. FIXME LATER!
* Add some test code to vm/swap_pager.c which double-checks that the page being paged out does not get corrupted during the operation. This code is #if 0'd.
* We must hold an object lock around the swp_pager_meta_ctl() call in swp_pager_async_iodone(). I think.
* Reorder when PG_SWAPINPROG is cleared. Finish the I/O before clearing the bit.
* Change the vm_map_growstack() API to pass a vm_map in instead of curproc.
* Use atomic ops for vm_object->generation counts, since objects can be locked shared.
VKERNEL
* Unconditionally save the FP state after returning from VMSPACE_CTL_RUN. This solves a severe FP corruption bug in the vkernel due to calls it makes into libc (which uses %xmm registers all over the place).
This is not a complete fix. We need a formal userspace/kernelspace FP abstraction. Right now the vkernel doesn't have a kernelspace FP abstraction so if a kernel thread switches preemptively bad things happen.
* The kernel tracks and locks pv_entry structures to interlock pte's. The vkernel never caught up, and does not really have a pv_entry or placemark mechanism. The vkernel's pmap really needs a complete re-port from the real-kernel pmap code. Until then, we use poor hacks.
* Use the vm_page's spinlock to interlock pte changes.
* Make sure that PG_WRITEABLE is set or cleared with the vm_page spinlock held.
* Have pmap_clearbit() acquire the pmobj token for the pmap in the iteration. This appears to be necessary, currently, as most of the rest of the vkernel pmap code also uses the pmobj token.
* Fix bugs in the vkernel's swapu32() and swapu64().
* Change pmap_page_lookup() and pmap_unwire_pgtable() to fully busy the page. Note however that a page table page is currently never soft-busied. Also other vkernel code that busies a page table page.
* Fix some sillycode in a pmap->pm_ptphint test.
* Don't inherit e.g. PG_M from the previous pte when overwriting it with a pte of a different physical address.
* Change the vkernel's pmap_clear_modify() function to clear VTPE_RW (which also clears VPTE_M), and not just VPTE_M. Formally we want the vkernel to be notified when a page becomes modified and it won't be unless we also clear VPTE_RW and force a fault. <--- I may change this back after testing.
* Wrap pmap_replacevm() with a critical section.
* Scrap the old grow_stack() code. vm_fault() and vm_fault_page() handle it (vm_fault_page() just now got the ability).
* Properly flag VM_FAULT_USERMODE.
show more ...
|
Revision tags: v4.6.1, v4.6.0 |
|
#
b4758707 |
| 26-Jul-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Disable lwp->lwp optimization in thread switcher
* Put #ifdef around the existing lwp->lwp switch optimization and then disable it. This optimizations tries to avoid reloading %cr3 and a
kernel - Disable lwp->lwp optimization in thread switcher
* Put #ifdef around the existing lwp->lwp switch optimization and then disable it. This optimizations tries to avoid reloading %cr3 and avoid pmap->pm_active atomic ops when switching to a lwp that shares the same process.
This optimization is no longer applicable on multi-core systems as such switches are very rare. LWPs are usually distributed across multiple cores so rarely does one switch to another on the same core (and in cpu-bound situations, the scheduler will already be in batch mode). The conditionals in the optimization, on the other hand, did measurably (just slightly) reduce performance for normal switches. So turn it off.
* Implement an optimization for interrupt preemptions, but disable it for now. I want to keep the code handy but so far my tests show no improvement in performance with huge interrupt rates (from nvme devices), so it is #undef'd for now.
show more ...
|
#
405f56bc |
| 26-Jul-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Minor cleanup swtch.s
* Minor cleanup
|
#
ee89e80b |
| 26-Jul-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Reduce atomic ops in switch code
* Instead of using four atomic 'and' ops and four atomic 'or' ops, use one atomic 'and' and one atomic 'or' when adjusting the pmap->pm_active.
* Store t
kernel - Reduce atomic ops in switch code
* Instead of using four atomic 'and' ops and four atomic 'or' ops, use one atomic 'and' and one atomic 'or' when adjusting the pmap->pm_active.
* Store the array index and simplified cpu mask in the globaldata structure for the above operation.
show more ...
|
Revision tags: v4.6.0rc2, v4.6.0rc, v4.7.0, v4.4.3, v4.4.2 |
|
#
2c64e990 |
| 25-Jan-2016 |
zrj <rimvydas.jasinskas@gmail.com> |
Remove advertising header from sys/
Correct BSD License clause numbering from 1-2-4 to 1-2-3.
Some less clear cases taken as it was done of FreeBSD.
|
Revision tags: v4.4.1, v4.4.0, v4.5.0, v4.4.0rc, v4.2.4, v4.3.1, v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc, v4.0.5, v4.0.4, v4.0.3 |
|
#
32d3bd25 |
| 06-Jan-2015 |
Sascha Wildner <saw@online.de> |
kernel/pc64: Change all the remaining #if JG's to #if 0 (fixing -Wundef).
|
Revision tags: v4.0.2, v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0, v3.8.2 |
|
#
06c66eb2 |
| 04-Jul-2014 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor cpumask_t to extend cpus past 64, part 2/2
* Expand SMP_MAXCPU from 64 to 256 (64-bit only)
* Expand cpumask_t from 64 to 256 bits
* Refactor the C macros and the assembly code.
kernel - Refactor cpumask_t to extend cpus past 64, part 2/2
* Expand SMP_MAXCPU from 64 to 256 (64-bit only)
* Expand cpumask_t from 64 to 256 bits
* Refactor the C macros and the assembly code.
* Add misc cpu_pauses()s and do a bit of work on the boot sequencing.
show more ...
|
#
cc694a4a |
| 30-Jun-2014 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Move CPUMASK_LOCK out of the cpumask_t
* Add cpulock_t (a 32-bit integer on all platforms) and implement CPULOCK_EXCL as well as space for a counter.
* Break-out CPUMASK_LOCK, add a new
kernel - Move CPUMASK_LOCK out of the cpumask_t
* Add cpulock_t (a 32-bit integer on all platforms) and implement CPULOCK_EXCL as well as space for a counter.
* Break-out CPUMASK_LOCK, add a new field to the pmap (pm_active_lock) and do the process vmm (p_vmm_cpulock) and implement the mmu interlock there.
The VMM subsystem uses additional bits in cpulock_t as a mask counter for implementing its interlock.
The PMAP subsystem just uses the CPULOCK_EXCL bit in pm_active_lock for its own interlock.
* Max cpus on 64-bit systems is now 64 instead of 63.
* cpumask_t is now just a pure cpu mask and no longer requires all-or-none atomic ops, just normal bit-for-bit atomic ops. This will allow us to hopefully extend it past the 64-cpu limit soon.
show more ...
|
Revision tags: v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc, v3.6.2 |
|
#
902419bf |
| 02-Apr-2014 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix mid-kernel-boot lockups due to module order
* Sometimes the kernel would just lock up for no apparent reason. Changing the preloaded module set (even randomly) could effect the behav
kernel - Fix mid-kernel-boot lockups due to module order
* Sometimes the kernel would just lock up for no apparent reason. Changing the preloaded module set (even randomly) could effect the behavior.
* The problem turned out to be an issue with kernel modules needing to temporarily migrate to another cpu (such as when installing a SWI or other interrupt handler). If the idle thread on cpu 0 had not yet bootstrapped, lwkt_switch_return() would not be called properly and the LWKT cpu migration code would never schedule on the target cpu.
* Fix the problem by handling the idle thread bootstrap case for cpu 0 properly in pc64.
show more ...
|
Revision tags: v3.6.1, v3.6.0, v3.7.1, v3.6.0rc, v3.7.0, v3.4.3, v3.4.2, v3.4.0, v3.4.1, v3.4.0rc, v3.5.0, v3.2.2 |
|
#
1918fc5c |
| 24-Oct-2012 |
Sascha Wildner <saw@online.de> |
kernel: Make SMP support default (and non-optional).
The 'SMP' kernel option gets removed with this commit, so it has to be removed from everybody's configs.
Reviewed-by: sjg Approved-by: many
|
Revision tags: v3.2.1, v3.2.0, v3.3.0, v3.0.3, v3.0.2, v3.0.1, v3.1.0, v3.0.0 |
|
#
3338cc67 |
| 12-Dec-2011 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Misc fixes and debugging
* Add required CLDs in the exception paths. The interrupt paths already CLD in PUSH_FRAME.
* Note that the fast_syscall (SYSENTER) path has an implied CLD due t
kernel - Misc fixes and debugging
* Add required CLDs in the exception paths. The interrupt paths already CLD in PUSH_FRAME.
* Note that the fast_syscall (SYSENTER) path has an implied CLD due to the hardware mask applied to rflags.
* Add the IOPL bits to the set of bits set to 0 during a fast_syscall.
* When creating a dummy interrupt frame we don't have to push the actual %cs. Just push $0 so the frame isn't misinterpreted as coming from userland.
* Additional debug verbosity for freeze_on_seg_fault.
* Reserve two void * fields for LWP debugging (for a later commit)
show more ...
|
#
121f93bc |
| 08-Dec-2011 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add TDF_RUNNING assertions
* Assert that the target lwkt thread being switched to is not flagged as running.
* Assert that the originating lwkt thread being switched from is flagged as
kernel - Add TDF_RUNNING assertions
* Assert that the target lwkt thread being switched to is not flagged as running.
* Assert that the originating lwkt thread being switched from is flagged as running.
* Fix the running flag initial condition for the idle thread.
show more ...
|
#
d8d8c8c5 |
| 01-Dec-2011 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix pmap->pm_active race in switch code
* Use an atomic cmpxchg to set the cpu bit in the pmap->pm_active bitmap AND test the pmap interlock bit at the same time, instead of testing the
kernel - Fix pmap->pm_active race in switch code
* Use an atomic cmpxchg to set the cpu bit in the pmap->pm_active bitmap AND test the pmap interlock bit at the same time, instead of testing the interlock bit afterwords.
* In addition, if we find the lock bit set and must spin-wait for it to clear, we skip the %cr3 comparison check and unconditionally load %cr3.
* It is unclear if the race could be realized in any way. It was probably not responsible for the seg-fault issue as prior tests with an unconditional load of %cr3 did not fix the problem. Plus in the same-%cr3-as-last-thread case the cpu bit is already set so there should be no possibility of losing a TLB interlock IPI (and %cr3 is loaded unconditionally when it doesn't match, so....).
But fix the race anyway.
show more ...
|
#
0ae279a9 |
| 30-Nov-2011 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - swtch.s cleanup
* Cleanup the code a bit, no functional changes.
|
#
86d7f5d3 |
| 26-Nov-2011 |
John Marino <draco@marino.st> |
Initial import of binutils 2.22 on the new vendor branch
Future versions of binutils will also reside on this branch rather than continuing to create new binutils branches for each new version.
|
#
f2081646 |
| 12-Nov-2011 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add syscall quick return path for x86-64
* Flag the case where a sysretq can be performed to quickly return from a system call instead of having to execute the slower doreti code.
* Th
kernel - Add syscall quick return path for x86-64
* Flag the case where a sysretq can be performed to quickly return from a system call instead of having to execute the slower doreti code.
* This about halves syscall times for simple system calls such as getuid(), and reduces longer syscalls by ~80ns or so on a fast 3.4GHz SandyBridge, but does not seem to really effect performance a whole lot.
Taken-From: FreeBSD (loosely)
show more ...
|
Revision tags: v2.12.0, v2.13.0, v2.10.1, v2.11.0, v2.10.0, v2.9.1, v2.8.2, v2.8.1, v2.8.0, v2.9.0, v2.6.3, v2.7.3, v2.6.2, v2.7.2, v2.7.1, v2.6.1, v2.7.0, v2.6.0 |
|
#
b2b3ffcd |
| 04-Nov-2009 |
Simon Schubert <corecode@dragonflybsd.org> |
rename amd64 architecture to x86_64
The rest of the world seems to call amd64 x86_64. Bite the bullet and rename all of the architecture files and references. This will hopefully make pkgsrc build
rename amd64 architecture to x86_64
The rest of the world seems to call amd64 x86_64. Bite the bullet and rename all of the architecture files and references. This will hopefully make pkgsrc builds less painful.
Discussed-with: dillon@
show more ...
|