#
6d0742ae |
| 20-Oct-2023 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Rearrange struct vmmeter (requires world and kernel build)
* Expand v_lock_name from 16 to 32 bytes
* Add v_lock_addr field to go along with v_lock_name. These fields report SMP content
kernel - Rearrange struct vmmeter (requires world and kernel build)
* Expand v_lock_name from 16 to 32 bytes
* Add v_lock_addr field to go along with v_lock_name. These fields report SMP contention.
* Rearrange vmmeter_uint_end to not include v_lock_name or v_lock_addr.
* Cleanup the do_vmmeter_pcpu() sysctl code. Remove the useless aggregation code and just do a structural copy for the per-cpu gd_cnt (struct vmmeter) structure.
show more ...
|
#
2b3f93ea |
| 13-Oct-2023 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add per-process capability-based restrictions
* This new system allows userland to set capability restrictions which turns off numerous kernel features and root accesses. These restricti
kernel - Add per-process capability-based restrictions
* This new system allows userland to set capability restrictions which turns off numerous kernel features and root accesses. These restrictions are inherited by sub-processes recursively. Once set, restrictions cannot be removed.
Basic restrictions that mimic an unadorned jail can be enabled without creating a jail, but generally speaking real security also requires creating a chrooted filesystem topology, and a jail is still needed to really segregate processes from each other. If you do so, however, you can (for example) disable mount/umount and most global root-only features.
* Add new system calls and a manual page for syscap_get(2) and syscap_set(2)
* Add sys/caps.h
* Add the "setcaps" userland utility and manual page.
* Remove priv.9 and the priv_check infrastructure, replacing it with a newly designed caps infrastructure.
* The intention is to add path restriction lists and similar features to improve jailess security in the near future, and to optimize the priv_check code.
show more ...
|
Revision tags: v6.4.0, v6.4.0rc1, v6.5.0, v6.2.2, v6.2.1, v6.2.0, v6.3.0, v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0 |
|
#
aab1a048 |
| 24-Nov-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix shared spin-lock starvation bug in VMs
* 'indefinite_uses_rdtsc' is set to zero by default when running in a virtual machine (and set to 1 on a real machine). However, this disable
kernel - Fix shared spin-lock starvation bug in VMs
* 'indefinite_uses_rdtsc' is set to zero by default when running in a virtual machine (and set to 1 on a real machine). However, this disables the windowing code in _spin_lock_shared_contested() and causes it to defer to pending execusive lock requests indefinitely under heavy-enough loads.
* Add a comment and always use a windowing test w/rdtsc() in _spin_lock_shared_contested().
We were trying to avoid using the rdtsc() in VMs because some of them apparently trap the rdtsc instruction. However, this puts us in a no-win situation when it comes to dealing with spin-locks. So take the hit and start using rdtsc again in some situations when operating in a VM.
Reported-by: mjg
show more ...
|
#
2a404fe0 |
| 02-Nov-2020 |
zrj <rimvydas.jasinskas@gmail.com> |
kernel: Avoid spurious diagnostic with older gcc47.
|
Revision tags: v5.8.3, v5.8.2, v5.8.1 |
|
#
4badc135 |
| 11-Mar-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Do not use rdtsc() in the spinlock loop when virtualized
* When running as a guest, do not use rdtsc() in the spinlock loop as numerous HVM subsystems will trap-out on the instruction.
R
kernel - Do not use rdtsc() in the spinlock loop when virtualized
* When running as a guest, do not use rdtsc() in the spinlock loop as numerous HVM subsystems will trap-out on the instruction.
Reported-by: mjg
show more ...
|
#
d033fb32 |
| 03-Mar-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Integrate the counter & 1 trick into the spinlock API
* The counter trick allows read accessors to sample a data structure without any further locks or ref-counts, as long as the data str
kernel - Integrate the counter & 1 trick into the spinlock API
* The counter trick allows read accessors to sample a data structure without any further locks or ref-counts, as long as the data structure is free-safe.
It does not necessarily protect pointers within the data structure so e.g. ref'ing some sub-structure via a data structure pointer is not safe on its own unless the sub-structure is able to provide some sort of additional guarantee.
* Our struct spinlock has always been 8 bytes, but only uses 4 bytes for the lock. Ipmlement the new API using the second field.
Accessor side: spin_update_start() spin_update_end()
Modifer side: spin_lock_update() spin_unlock_update()
* On the acessor side if spin_update_start() detects a change in-progress it will obtain a shared spin-lock, else remains unlocked.
spin_update_end() tells the caller whether it must retry the operation, i.e. if a change occurred between start and end. This can only happen if spin_update_start() remained unlocked.
If the start did a shared lock then no changes are assumed to have occurred and spin_update_end() will release the shared spinlock and return 0.
* On the modifier side, spin_lock_update() obtains an exclusive spinlock and increments the update counter, making it odd ((spin->update & 1) != 0).
spin_unlock_update() increments the counter again, making it even but different, and releases the exclusive spinlock.
show more ...
|
Revision tags: v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3 |
|
#
e8b1691f |
| 15-Dec-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Test pending ints in more crit_exit*() paths
* A number of crit_exit*() paths, primarily in the mutex and spinlock code, were not testing for interrupts made pending on the last unwind
kernel - Test pending ints in more crit_exit*() paths
* A number of crit_exit*() paths, primarily in the mutex and spinlock code, were not testing for interrupts made pending on the last unwind of the critical section.
This was originally intended to improve performance, but it can lead to non-deterministic latencies for processing interrupts.
* Process these pending events in such cases. We will see if it affects performance but I don't think it will be noticeable.
show more ...
|
#
eb8c4738 |
| 02-Nov-2019 |
zrj <rimvydas.jasinskas@gmail.com> |
vkernel64: Reduce <pthread.h> exposure to generic kernel sources.
Implement vkernel_yield() wrapper and use it where needed.
|
Revision tags: v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3 |
|
#
288f331f |
| 09-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Quick pass add __read_frequently
* Do a quick pass to add __read_frequently to certain specific globals.
|
Revision tags: v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1 |
|
#
a18b747c |
| 04-May-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix spinlock bug introduced with windowing (2)
* Fix the fix for the spinlock bug to be the actual fix and not a bad dream.
Reported-by: zrj
|
#
c5cfe2c8 |
| 02-May-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix spinlock bug introduced with windowing
* The exclusive spinlock contention code was improperly assuming that non-zero EXCLWAIT bits prevented the SHARED bit from being set. This is
kernel - Fix spinlock bug introduced with windowing
* The exclusive spinlock contention code was improperly assuming that non-zero EXCLWAIT bits prevented the SHARED bit from being set. This is no longer true, shared locks can sometimes override EXCLWAIT. This assumption could result in spin_lock() returning with a shared lock instead of an exclusive lock.
* Fixed by ensuring that the SHARED bit is cleared when resolving the contended exclusive lock.
* Should hopefully fix the pmap pv == NULL assertion.
Reported-by: dillon, zrj
show more ...
|
#
cc705b82 |
| 23-Apr-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Carefully refactor contended tokens and spinlocks
* Carefully put the exponential backoff back in for spinlocks, and implement for tokens. Only applicable to exclusive locks, capped vi
kernel - Carefully refactor contended tokens and spinlocks
* Carefully put the exponential backoff back in for spinlocks, and implement for tokens. Only applicable to exclusive locks, capped via sysctl (mjg). Tested on dual-socket xeon.
* Exclusive priority for shared locks reduces the shared/exclusive starvation that can occur when exclusive locks use exponential backoff.
* Exponential backoff significantly improves performance for heavily contended exclusive locks by allowing some degree of burst operation.
* Implement TSC windowing for shared locks (and a little for exclusive locks too). This prevents heavily contended exclusive locks from completely starving shared locks by using windowing to disable the exclusive-priority mechanic for shared locks.
This allows a few contending shared locks to compete on equal ground with exclusive locks.
Suggested-by: mjg
show more ...
|
#
97cfa330 |
| 18-Apr-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Handle spinlock indefinite wait edge case
* The spinlock exclusive priority mechanism can cause an indefinite wait situation for shared locks to arise when a large number of cpu cores a
kernel - Handle spinlock indefinite wait edge case
* The spinlock exclusive priority mechanism can cause an indefinite wait situation for shared locks to arise when a large number of cpu cores are cycling the same spinlock both shared and exclusive.
This situation just won't happen for any real workload, but it can come up in benchmarks.
* Introduce a quick hack to ensure that this situation does not lead to a panic. The exclusive priority mechanism is ignored once a shared spinlock has spun for greater than one second.
show more ...
|
#
9abb66c5 |
| 17-Apr-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Improve spinlock performance a bit
* Rearrange indefinite_init() and cpu_pause() in _spin_lock_contested() and _spin_lock_shared_contested() to improve performance.
* Fix conditional cle
kernel - Improve spinlock performance a bit
* Rearrange indefinite_init() and cpu_pause() in _spin_lock_contested() and _spin_lock_shared_contested() to improve performance.
* Fix conditional clearing the SHARED bit to use ovalue intead of value. After review, either can be used but ovalue is more appropriate and give us an interlock against SPINLOCK_EXCLWAIT.
Reported-by: mjg_
show more ...
|
Revision tags: v5.2.0, v5.3.0, v5.2.0rc |
|
#
ae4025a1 |
| 17-Mar-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Improve spinlock performance
* Primarily improve spinlock performance when transitioning from an exclusive to a shared lock by allowing atomic_fetchadd_int() to be used instead of atomi
kernel - Improve spinlock performance
* Primarily improve spinlock performance when transitioning from an exclusive to a shared lock by allowing atomic_fetchadd_int() to be used instead of atomic_cmpset_int().
* Also clean up a few remaining atomic_cmpset_int() cases that can use atomioc_fcmpset_int() instead.
Suggested-by: mjg
show more ...
|
Revision tags: v5.0.2, v5.0.1, v5.0.0 |
|
#
b1793cc6 |
| 05-Oct-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor smp collision statistics (2)
* Refactor indefinite_info mechanics. Instead of tracking indefinite loops on a per-thread basis for tokens, track them on a scheduler basis. The
kernel - Refactor smp collision statistics (2)
* Refactor indefinite_info mechanics. Instead of tracking indefinite loops on a per-thread basis for tokens, track them on a scheduler basis. The scheduler records the overhead while it is live-looping on tokens, but the moment it finds a thread it can actually schedule it stops (then restarts later the next time it is entered), even if some of the other threads still have unresolved tokens.
This gives us a fairer representation of how many cpu cycles are actually being wasted waiting for tokens.
* Go back to using a local indefinite_info in the lockmgr*(), mutex*(), and spinlock code.
* Refactor lockmgr() by implementing an __inline frontend to interpret the directive. Since this argument is usually a constant, the change effectively removes the switch().
Use LK_NOCOLLSTATS to create a clean recursion to wrap the blocking case with the indefinite*() API.
show more ...
|
#
1b8fb8d2 |
| 05-Oct-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Optimize shared -> excl spinlock contention
* When exclusive request is spinning waiting for shared holders to release, throw in addition cpu_pause()'s based on the number of shared hol
kernel - Optimize shared -> excl spinlock contention
* When exclusive request is spinning waiting for shared holders to release, throw in addition cpu_pause()'s based on the number of shared holders.
Suggested-by: mjg_
show more ...
|
#
5b49787b |
| 05-Oct-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor smp collision statistics
* Add an indefinite wait timing API (sys/indefinite.h, sys/indefinite2.h). This interface uses the TSC and will record lock latencies to our pcpu stat
kernel - Refactor smp collision statistics
* Add an indefinite wait timing API (sys/indefinite.h, sys/indefinite2.h). This interface uses the TSC and will record lock latencies to our pcpu stats in microseconds. The systat -pv 1 display shows this under smpcoll.
Note that latencies generated by tokens, lockmgr, and mutex locks do not necessarily reflect actual lost cpu time as the kernel will schedule other threads while those are blocked, if other threads are available.
* Formalize TSC operations more, supply a type (tsc_uclock_t and tsc_sclock_t).
* Reinstrument lockmgr, mutex, token, and spinlocks to use the new indefinite timing interface.
show more ...
|
#
a4d95680 |
| 03-Oct-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix GCC reordering problem with td_critcount
* Wrap all ++td->td_critcount and --td->td_critcount use cases with an inline which executes cpu_ccfence() before and after, to guarantee th
kernel - Fix GCC reordering problem with td_critcount
* Wrap all ++td->td_critcount and --td->td_critcount use cases with an inline which executes cpu_ccfence() before and after, to guarantee that GCC does not try to reorder the operation around critical memory changes.
* This fixes a race in lockmgr() and possibly a few other places too.
show more ...
|
Revision tags: v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc |
|
#
e22f2acd |
| 30-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix shared/exclusive spinlock race
* Fix a long-standing bug in the shared spinlock code which could unintentionally cause a contending exclusive waiter to acquire its lock shared.
* F
kernel - Fix shared/exclusive spinlock race
* Fix a long-standing bug in the shared spinlock code which could unintentionally cause a contending exclusive waiter to acquire its lock shared.
* Fixes a pmap issue exercised by the vkernel.
* The namecache also uses shared spinlocks but was far less likely to hit the bug.
show more ...
|
#
6ba5daf8 |
| 08-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Move vm_page spin locks from pool to vm_page structure
* Move the vm_page spin lock from a pool to per-structure. This does bloat the vm_page structure, but clears up an area of contenti
kernel - Move vm_page spin locks from pool to vm_page structure
* Move the vm_page spin lock from a pool to per-structure. This does bloat the vm_page structure, but clears up an area of contention under heavy VM loads.
show more ...
|
#
cff27bad |
| 05-Dec-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Spiff up locks a bit
* Do a little optimization of _spin_lock_contested(). The critical path is able to avoid two atomic ops in the initialization portion of the contested path.
* Opt
kernel - Spiff up locks a bit
* Do a little optimization of _spin_lock_contested(). The critical path is able to avoid two atomic ops in the initialization portion of the contested path.
* Optimize _spin_lock_shared_contested() to use atomic_fetchadd_long() to add a shared-lock count instead of atomic_cmpset_long(). Shared spinlocks are used heavily and this will prevent a lot of unnecessary spinning when many cpus are using the same lock at the same time.
* Hold fdp->fd_spin across fdp->fd_cdir and fdp->fd_ncdir modifications. This completes other work which caches fdp->fd_ncdir and avoids having to obtain the spin-lock when the cache matches.
Discussed-with: Mateusz Guzik (mjg_)
show more ...
|
#
01be7a8f |
| 04-Dec-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Reduce spinning on shared spinlocks
* Improve spinlock performance by removing unnecessary extra reads, using atomic_fetchadd_int() to avoid a cmpxchg loop, and allowing the SHARED flag
kernel - Reduce spinning on shared spinlocks
* Improve spinlock performance by removing unnecessary extra reads, using atomic_fetchadd_int() to avoid a cmpxchg loop, and allowing the SHARED flag to remain soft-set on the 1->0 transition.
* The primary improvement here is that multiple cpu's obtaining the same shared spinlock can now do so via a single atomic_fetchadd_int(), whereas before we had multiple atomics and cmpxchg loops. This does not remove the cacheline ping-pong but it significantly reduces unnecessary looping when multiple cpu cores are heavily loading the same shared spin lock.
* Trade-off is against the case where a spinlock's use-case switches from shared to exclusive or back again, which requires an extra atomic op to deal with. This is not a common case.
* Remove spin->countb debug code, it interferes with hw cacheline operations and is no longer desireable.
Discussed-with: Mateusz Guzik (mjg_)
show more ...
|
Revision tags: v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0, v4.4.3, v4.4.2, v4.4.1, v4.4.0, v4.5.0, v4.4.0rc, v4.2.4, v4.3.1, v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc, v4.0.5, v4.0.4, v4.0.3, v4.0.2, v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0 |
|
#
ba87a4ab |
| 24-Aug-2014 |
Sascha Wildner <saw@online.de> |
kernel/spinlock: Add a description to struct spinlock.
And add it to spin_init() and SPINLOCK_INITIALIZER().
Submitted-by: dclink (see <http://bugs.dragonflybsd.org/issues/2714>) OK'd-by: dill
kernel/spinlock: Add a description to struct spinlock.
And add it to spin_init() and SPINLOCK_INITIALIZER().
Submitted-by: dclink (see <http://bugs.dragonflybsd.org/issues/2714>) OK'd-by: dillon
show more ...
|
Revision tags: v3.8.2, v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc, v3.6.2, v3.6.1, v3.6.0 |
|
#
050032ec |
| 08-Nov-2013 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Improve SMP collision statistics
* Populate the per-cpu collision counter and label from the spinlock, lockmgr lock, and mutex code. The token code already used it.
* Pass __func__ to t
kernel - Improve SMP collision statistics
* Populate the per-cpu collision counter and label from the spinlock, lockmgr lock, and mutex code. The token code already used it.
* Pass __func__ to the spinlock routines so it can be copied into the per-cpu collision label.
show more ...
|