#
2b3f93ea |
| 13-Oct-2023 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add per-process capability-based restrictions
* This new system allows userland to set capability restrictions which turns off numerous kernel features and root accesses. These restricti
kernel - Add per-process capability-based restrictions
* This new system allows userland to set capability restrictions which turns off numerous kernel features and root accesses. These restrictions are inherited by sub-processes recursively. Once set, restrictions cannot be removed.
Basic restrictions that mimic an unadorned jail can be enabled without creating a jail, but generally speaking real security also requires creating a chrooted filesystem topology, and a jail is still needed to really segregate processes from each other. If you do so, however, you can (for example) disable mount/umount and most global root-only features.
* Add new system calls and a manual page for syscap_get(2) and syscap_set(2)
* Add sys/caps.h
* Add the "setcaps" userland utility and manual page.
* Remove priv.9 and the priv_check infrastructure, replacing it with a newly designed caps infrastructure.
* The intention is to add path restriction lists and similar features to improve jailess security in the near future, and to optimize the priv_check code.
show more ...
|
Revision tags: v6.4.0, v6.4.0rc1, v6.5.0, v6.2.2, v6.2.1, v6.2.0, v6.3.0, v6.0.1 |
|
#
949c56f8 |
| 23-Jul-2021 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Rename vm_map_wire() and vm_map_unwire()
* These names are mutant throwbacks to an earlier age and no longer mean what is implied.
* Rename vm_map_wire() to vm_map_kernel_wiring(). This
kernel - Rename vm_map_wire() and vm_map_unwire()
* These names are mutant throwbacks to an earlier age and no longer mean what is implied.
* Rename vm_map_wire() to vm_map_kernel_wiring(). This function can wire and unwire VM ranges in a vm_map under kernel control. Userland has no say.
* Rename vm_map_unwire() to vm_map_user_wiring(). This function can wire and unwire VM ranges in a vm_map under user control. Userland can adjust the user wiring state for pages.
show more ...
|
#
1eeaf6b2 |
| 20-May-2021 |
Aaron LI <aly@aaronly.me> |
vm: Change 'kernel_map' global to type of 'struct vm_map *'
Change the global variable 'kernel_map' from type 'struct vm_map' to a pointer to this struct. This simplify the code a bit since all inv
vm: Change 'kernel_map' global to type of 'struct vm_map *'
Change the global variable 'kernel_map' from type 'struct vm_map' to a pointer to this struct. This simplify the code a bit since all invocations take its address. This change also aligns with NetBSD's 'kernal_map' that it's also a pointer, which also helps the porting of NVMM.
No functional changes.
show more ...
|
Revision tags: v6.0.0, v6.0.0rc1, v6.1.0 |
|
#
acdf1ee6 |
| 15-Nov-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add PROC_PDEATHSIG_CTL and PROC_PDEATHSIG_STATUS
* Add PROC_PDEATHSIG_CTL and PROC_PDEATHSIG_STATUS to procctl(2).
This follows the linux and freebsd semantics, however it should be note
kernel - Add PROC_PDEATHSIG_CTL and PROC_PDEATHSIG_STATUS
* Add PROC_PDEATHSIG_CTL and PROC_PDEATHSIG_STATUS to procctl(2).
This follows the linux and freebsd semantics, however it should be noted that since the child of a fork() clears the setting, these semantics have a fork/exit race between an exiting parent and a child which has not yet setup its death wish.
* Also fix a number of signal ranging checks.
Requested-by: zrj
show more ...
|
Revision tags: v5.8.3, v5.8.2 |
|
#
80d831e1 |
| 25-Jul-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor in-kernel system call API to remove bcopy()
* Change the in-kernel system call prototype to take the system call arguments as a separate pointer, and make the contents read-onl
kernel - Refactor in-kernel system call API to remove bcopy()
* Change the in-kernel system call prototype to take the system call arguments as a separate pointer, and make the contents read-only.
int sy_call_t (void *); int sy_call_t (struct sysmsg *sysmsg, const void *);
* System calls with 6 arguments or less no longer need to copy the arguments from the trapframe to a holding structure. Instead, we simply point into the trapframe.
The L1 cache footprint will be a bit smaller, but in simple tests the results are not noticably faster... maybe 1ns or so (roughly 1%).
show more ...
|
Revision tags: v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1 |
|
#
01251219 |
| 14-Feb-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Start work on a better burst page-fault mechanic
* The vm.fault_quick sysctl is now a burst count. It still defaults to 1 which is the same operation as before.
Performance is roughly
kernel - Start work on a better burst page-fault mechanic
* The vm.fault_quick sysctl is now a burst count. It still defaults to 1 which is the same operation as before.
Performance is roughly the same with it set to 1 to 8 as more work needs to be done to optimize pmap_enter().
show more ...
|
Revision tags: v5.6.3 |
|
#
8b411d28 |
| 12-Nov-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix first-lwp access race vs process creation
* It is possible for a process to be looked up before its primary lwp is installed. Make sure this doesn't crash the kernel.
|
#
13dd34d8 |
| 18-Oct-2019 |
zrj <rimvydas.jasinskas@gmail.com> |
kernel: Cleanup <sys/uio.h> issues.
The iovec_free() inline very complicates this header inclusion. The NULL check is not always seen from <sys/_null.h>. Luckily only three kernel sources needs
kernel: Cleanup <sys/uio.h> issues.
The iovec_free() inline very complicates this header inclusion. The NULL check is not always seen from <sys/_null.h>. Luckily only three kernel sources needs it: kern_subr.c, sys_generic.c and uipc_syscalls.c. Also just a single dev/drm source makes use of 'struct uio'. * Include <sys/uio.h> explicitly first in drm_fops.c to avoid kfree() macro override in drm compat layer. * Use <sys/_uio.h> where only enums and struct uio is needed, but ensure that userland will not include it for possible later <sys/user.h> use. * Stop using <sys/vnode.h> as shortcut for uiomove*() prototypes. The uiomove*() family functions possibly transfer data across kernel/user space boundary. This header presence explicitly mark sources as such. * Prefer to add <sys/uio.h> after <sys/systm.h>, but before <sys/proc.h> and definitely before <sys/malloc.h> (except for 3 mentioned sources). This will allow to remove <sys/malloc.h> from <sys/uio.h> later on. * Adjust <sys/user.h> to use component headers instead of <sys/uio.h>.
While there, use opportunity for a minimal whitespace cleanup.
No functional differences observed in compiler intermediates.
show more ...
|
#
e63f9299 |
| 06-Oct-2019 |
zrj <rimvydas.jasinskas@gmail.com> |
world: Eliminate custom uintfptr_t/fptrdiff_t types.
These were not used consistently and have visibility limitations, types.h - #indef _KERNEL while profile.h - publicly. Use cases like: "uint
world: Eliminate custom uintfptr_t/fptrdiff_t types.
These were not used consistently and have visibility limitations, types.h - #indef _KERNEL while profile.h - publicly. Use cases like: "uintfptr_t selfpcdiff;" and "fptrdiff_t frompci;" only confuse. Given that underlying structs uprof, rawarc already use plain u_long types, there are plenty (u_long) casts elsewhere in the kernel code, follow OpenBSD and use use (u_long) casts that are clear what they do. The unused intfptr_t type does not make much sense anyway.
show more ...
|
Revision tags: v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0 |
|
#
2a7bd4d8 |
| 18-May-2019 |
Sascha Wildner <saw@online.de> |
kernel: Don't include <sys/user.h> in kernel code.
There is really no point in doing that because its main purpose is to expose kernel structures to userland. The majority of cases wasn't needed at
kernel: Don't include <sys/user.h> in kernel code.
There is really no point in doing that because its main purpose is to expose kernel structures to userland. The majority of cases wasn't needed at all and the rest required only a couple of other includes.
show more ...
|
Revision tags: v5.4.3 |
|
#
44293a80 |
| 09-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework part 3 - Cleanup pass
* Cleanup various structures and code
|
#
9de48ead |
| 09-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework part 2 - Replace backing_object with backing_ba
* Remove the vm_object based backing_object chains and all related chaining code.
This removes an enormous number of locks fro
kernel - VM rework part 2 - Replace backing_object with backing_ba
* Remove the vm_object based backing_object chains and all related chaining code.
This removes an enormous number of locks from the VM system and also removes object-to-object dependencies which requires careful traversal code. A great deal of complex code has been removed and replaced with far simpler code.
Ultimately the intention will be to support removal of pv_entry tracking from vm_pages to gain lockless shared faults, but that is far in the future. It will require hanging vm_map_backing structures off of a list based in the object.
* Implement the vm_map_backing structure which is embedded in the vm_map_entry and then links to additional dynamically allocated vm_map_backing structures via entry->ba.backing_ba. This structure contains the object and offset and essentially takes over the functionality that object->backing_object used to have.
backing objects are now handled via vm_map_backing. In this commit, fork operations create a fan-in tree to shared subsets of backings via vm_map_backing. In this particular commit, these subsets are not collapsed in any way.
* Remove all the vm_map_split and collapse code. Every last line is gone. It will be reimplemented using vm_map_backing in a later commit.
This means that as-of this commit both recursive forks and parent-to-multiple-children forks cause an accumulation of inefficient lists of backing objects to occur in the parent and children. This will begin to get addressed in part 3.
* The code no longer releases the vm_map lock (typically shared) across (get_pages) I/O. There are no longer any chaining locks to get in the way (hopefully). This means that the code does not have to re-check as carefully as it did before. However, some complexity will have to be added back in once we begin to address the accumulation of vm_map_backing structures.
* Paging performance improved by 30-40%
show more ...
|
Revision tags: v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1, v5.2.0, v5.3.0, v5.2.0rc, v5.0.2 |
|
#
7a45978d |
| 09-Nov-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix bug in vm_fault_page()
* Fix a bug in vm_fault_page() and vm_fault_page_quick(). The code is not intended to update the user pmap, but if the vm_map_lookup() results in a COW, any
kernel - Fix bug in vm_fault_page()
* Fix a bug in vm_fault_page() and vm_fault_page_quick(). The code is not intended to update the user pmap, but if the vm_map_lookup() results in a COW, any existing page in the underlying pmap will no longer match the page that should be there.
The user process will still work correctly in that it will fault the COW'd page if/when it tries to issue a write to that address, but userland will not have visibility to any kernel use of vm_fault_page() that modifies the page and causes a COW if the page has already been faulted in.
* Fixed by detecting the COW and at least removing the pte from the pmap to force userland to re-fault it.
* This fixes gdb operation on programs. The problem did not rear its head before because the kernel did not pre-populate as many pages in the initial exec as it does now.
* Enhance vm_map_lookup()'s &wired argument to return wflags instead, which includes FS_WIRED and also now has FS_DIDCOW.
Reported-by: profmakx
show more ...
|
Revision tags: v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc |
|
#
3091de50 |
| 17-Dec-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Tag vm_map_entry structure, slight optimization to zalloc, misc.
* Tag the vm_map_entry structure, allowing debugging programs to break-down how KMEM is being used more easily.
This re
kernel - Tag vm_map_entry structure, slight optimization to zalloc, misc.
* Tag the vm_map_entry structure, allowing debugging programs to break-down how KMEM is being used more easily.
This requires an additional argument to vm_map_find() and most kmem_alloc*() functions.
* Remove the page chunking parameter to zinit() and zinitna(). It was only being used degeneratively. Increase the chunking from one page to four pages, which will reduce the amount of vm_map_entry spam in the kernel_map.
* Use atomic ops when adjusting zone_kern_pages.
show more ...
|
Revision tags: v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0, v4.4.3, v4.4.2, v4.4.1, v4.4.0, v4.5.0, v4.4.0rc, v4.2.4, v4.3.1, v4.2.3 |
|
#
f5b92db7 |
| 10-Jul-2015 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix panic during coredump
* multi-threaded coredumps were not stopping all other threads before attempting to scan the vm_map, resulting in numerous possible panics.
* Add a new process
kernel - Fix panic during coredump
* multi-threaded coredumps were not stopping all other threads before attempting to scan the vm_map, resulting in numerous possible panics.
* Add a new process state, SCORE, indicating that a core dump is in progress and adjust proc_stop() and friends as well as any code which tests the SSTOP state. SCORE overrides SSTOP.
* The coredump code actively waits for all running threads to stop before proceeding.
* Prevent a deadlock between a SIGKILL and core dump in progress by temporarily counting the master exit thread as a stopped thread (which allows the coredump to proceed and finish).
Reported-by: marino
show more ...
|
Revision tags: v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc, v4.0.5, v4.0.4, v4.0.3, v4.0.2, v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0 |
|
#
0adbcbd6 |
| 16-Oct-2014 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add /dev/upmap and /dev/kpmap and sys/upmap.h
* Add two memory-mappable devices for accessing a per-process and global kernel shared memory space. These can be mapped to acquire certain
kernel - Add /dev/upmap and /dev/kpmap and sys/upmap.h
* Add two memory-mappable devices for accessing a per-process and global kernel shared memory space. These can be mapped to acquire certain information from the kernel that would normally require a system call in a more efficient manner.
Userland programs using this feature should NOT directly map the sys_upmap and sys_kpmap structures (which is why they are in #ifdef _KERNEL sections in sys/upmap.h). Instead, mmap the devices using UPMAP_MAPSIZE and KPMAP_MAPSIZE and parse the ukpheader[] array at the front of each area to locate the desired fields. You can then simply cache a pointer to the desired field.
The width of the field is encoded in the UPTYPE/KPTYPE elements and can be asserted if desired, user programs are not expected to handle integers of multiple sizes for the same field type.
* Add /dev/upmap. A program can open and mmap() this device R+W and use it to access:
header[...] - See sys/upmap.h. An array of headers terminating with a type=0 header indicating where various fields are in the mapping. This should be used by userland instead of directly mapping to the struct sys_upmap structure.
version - The sys_upmap version, typically 1.
runticks - Scheduler run ticks (aggregate, all threads). This may be used by userland interpreters to determine when to soft-switch.
forkid - A unique non-zero 64-bit fork identifier. This is NOT a pid. This may be used by userland libraries to determine if a fork has occurred by comparing against a stored value.
pid - The current process pid. This may be used to acquire the process pid without having to make further system calls.
proc_title - This starts out as an empty buffer and may be used to set the process title. To revert to the original process title, set proc_title[0] to 0.
NOTE! Userland may write to the entire buffer, but it is recommended that userland only write to fields intended to be writable.
NOTE! When a program forks, an area already mmap()d remains mmap()d but will point to the new process's area and not the old, so libraries do not need to do anything special atfork.
NOTE! Access to this structure is cpu localized.
* Add /dev/kpmap. A program can open and mmap() this device RO and use it to access:
header[...] - See sys/upmap.h. An array of headers terminating with a type=0 header indicating where various fields are in the mapping. This should be used by userland instead of directly mapping to the struct sys_upmap structure.
version - The sys_kpmap version, typically 1.
upticks - System uptime tick counter (32 bit integer). Monotonic, uncompensated.
ts_uptime - System uptime in struct timespec format at tick-resolution. Monotonic, uncompensated.
ts_realtime - System realtime in struct timespec format at tick-resolution. This is compensated so reverse-indexing is possible.
tsc_freq - If the system supports a TSC of some sort, the TSC frequency is recorded here, else 0.
tick_freq - The tick resolution of ts_uptime and ts_realtime and approximate tick resolution for the scheduler. Typically 100.
NOTE! Userland may only read from this buffer.
NOTE! Access to this structure is NOT cpu localized. A memory fence and double-check should be used when accessing non-atomic structures which might change such as ts_uptime and ts_realtime.
XXX needs work.
show more ...
|
Revision tags: v3.8.2, v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc, v3.6.2, v3.6.1 |
|
#
763ff625 |
| 18-Dec-2013 |
Nicolas Thery <nthery@gmail.com> |
kernel: forbid ptrace on system processes
The scenario that triggered this change is the GDB test suite which tries to attach to process 0 (the swapper). This dereferenced a NULL pointer while repa
kernel: forbid ptrace on system processes
The scenario that triggered this change is the GDB test suite which tries to attach to process 0 (the swapper). This dereferenced a NULL pointer while reparenting the swapper to GDB as the former has no parent.
ptrace(2) is intended for debugging user processes so prevent it altogether on system processes as this is deadlock prone.
There were already calls to procfs for preventing accesses to registers of system processes. Remove the now superfluous comments but leave these calls as they may be extended someday to check for more conditions.
Dragonfly-bug: <http://bugs.dragonflybsd.org/issue2615>
show more ...
|
Revision tags: v3.6.0, v3.7.1, v3.6.0rc, v3.7.0 |
|
#
a8d3ab53 |
| 25-Oct-2013 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - proc_token removal pass stage 1/2
* Remove proc_token use from all subsystems except kern/kern_proc.c.
* The token had become mostly useless in these subsystems now that process locking
kernel - proc_token removal pass stage 1/2
* Remove proc_token use from all subsystems except kern/kern_proc.c.
* The token had become mostly useless in these subsystems now that process locking is more fine-grained. Do the final wipe of proc_token except for allproc/zombproc list use in kern_proc.c
show more ...
|
Revision tags: v3.4.3, v3.4.2, v3.4.0, v3.4.1, v3.4.0rc, v3.5.0 |
|
#
47443c9a |
| 26-Feb-2013 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix panic on ptrace termination
* Fix a panic in the situation where gdb is exiting and terminating a ptrace, but the original parent prpocess of the process being debugged no longer ex
kernel - Fix panic on ptrace termination
* Fix a panic in the situation where gdb is exiting and terminating a ptrace, but the original parent prpocess of the process being debugged no longer exists.
show more ...
|
Revision tags: v3.2.2 |
|
#
55c81f71 |
| 28-Nov-2012 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix proc_reparent() race/assertion panic
* Fix proc_reparent() race/assertion panic. p_pptr changes can race, and the procedure had an assertion for the case. Recode the procedure to
kernel - Fix proc_reparent() race/assertion panic
* Fix proc_reparent() race/assertion panic. p_pptr changes can race, and the procedure had an assertion for the case. Recode the procedure to retry on a mismatch instead of assert.
* Also move the old-parent-wakeup code into the procedure so it is properly executed in all cases.
Reported-by: Peter Avalos
show more ...
|
Revision tags: v3.2.1, v3.2.0, v3.3.0, v3.0.3, v3.0.2, v3.0.1, v3.1.0, v3.0.0 |
|
#
86d7f5d3 |
| 26-Nov-2011 |
John Marino <draco@marino.st> |
Initial import of binutils 2.22 on the new vendor branch
Future versions of binutils will also reside on this branch rather than continuing to create new binutils branches for each new version.
|
#
dda969a8 |
| 18-Nov-2011 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Cleanup and document
* Cleanup and document various bits of code.
|
#
4643740a |
| 15-Nov-2011 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Major signal path adjustments to fix races, tsleep race fixes, +more
* Refactor the signal code to properly hold the lp->lwp_token. In particular the ksignal() and lwp_signotify() paths.
kernel - Major signal path adjustments to fix races, tsleep race fixes, +more
* Refactor the signal code to properly hold the lp->lwp_token. In particular the ksignal() and lwp_signotify() paths.
* The tsleep() path must also hold lp->lwp_token to properly handle lp->lwp_stat states and interlocks.
* Refactor the timeout code in tsleep() to ensure that endtsleep() is only called from the proper context, and fix races between endtsleep() and lwkt_switch().
* Rename proc->p_flag to proc->p_flags
* Rename lwp->lwp_flag to lwp->lwp_flags
* Add lwp->lwp_mpflags and move flags which require atomic ops (are adjusted when not the current thread) to the new field.
* Add td->td_mpflags and move flags which require atomic ops (are adjusted when not the current thread) to the new field.
* Add some freeze testing code to the x86-64 trap code (default disabled).
show more ...
|
#
b12defdc |
| 18-Oct-2011 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Major SMP performance patch / VM system, bus-fault/seg-fault fixes
This is a very large patch which reworks locking in the entire VM subsystem, concentrated on VM objects and the x86-64 pma
kernel - Major SMP performance patch / VM system, bus-fault/seg-fault fixes
This is a very large patch which reworks locking in the entire VM subsystem, concentrated on VM objects and the x86-64 pmap code. These fixes remove nearly all the spin lock contention for non-threaded VM faults and narrows contention for threaded VM faults to just the threads sharing the pmap.
Multi-socket many-core machines will see a 30-50% improvement in parallel build performance (tested on a 48-core opteron), depending on how well the build parallelizes.
As part of this work a long-standing problem on 64-bit systems where programs would occasionally seg-fault or bus-fault for no reason has been fixed. The problem was related to races between vm_fault, the vm_object collapse code, and the vm_map splitting code.
* Most uses of vm_token have been removed. All uses of vm_spin have been removed. These have been replaced with per-object tokens and per-queue (vm_page_queues[]) spin locks.
Note in particular that since we still have the page coloring code the PQ_FREE and PQ_CACHE queues are actually many queues, individually spin-locked, resulting in very excellent MP page allocation and freeing performance.
* Reworked vm_page_lookup() and vm_object->rb_memq. All (object,pindex) lookup operations are now covered by the vm_object hold/drop system, which utilize pool tokens on vm_objects. Calls now require that the VM object be held in order to ensure a stable outcome.
Also added vm_page_lookup_busy_wait(), vm_page_lookup_busy_try(), vm_page_busy_wait(), vm_page_busy_try(), and other API functions which integrate the PG_BUSY handling.
* Added OBJ_CHAINLOCK. Most vm_object operations are protected by the vm_object_hold/drop() facility which is token-based. Certain critical functions which must traverse backing_object chains use a hard-locking flag and lock almost the entire chain as it is traversed to prevent races against object deallocation, collapses, and splits.
The last object in the chain (typically a vnode) is NOT locked in this manner, so concurrent faults which terminate at the same vnode will still have good performance. This is important e.g. for parallel compiles which might be running dozens of the same compiler binary concurrently.
* Created a per vm_map token and removed most uses of vmspace_token.
* Removed the mp_lock in sys_execve(). It has not been needed in a while.
* Add kmem_lim_size() which returns approximate available memory (reduced by available KVM), in megabytes. This is now used to scale up the slab allocator cache and the pipe buffer caches to reduce unnecessary global kmem operations.
* Rewrote vm_page_alloc(), various bits in vm/vm_contig.c, the swapcache scan code, and the pageout scan code. These routines were rewritten to use the per-queue spin locks.
* Replaced the exponential backoff in the spinlock code with something a bit less complex and cleaned it up.
* Restructured the IPIQ func/arg1/arg2 array for better cache locality. Removed the per-queue ip_npoll and replaced it with a per-cpu gd_npoll, which is used by other cores to determine if they need to issue an actual hardware IPI or not. This reduces hardware IPI issuance considerably (and the removal of the decontention code reduced it even more).
* Temporarily removed the lwkt thread fairq code and disabled a number of features. These will be worked back in once we track down some of the remaining performance issues.
Temproarily removed the lwkt thread resequencer for tokens for the same reason. This might wind up being permanent.
Added splz_check()s in a few critical places.
* Increased the number of pool tokens from 1024 to 4001 and went to a prime-number mod algorithm to reduce overlaps.
* Removed the token decontention code. This was a bit of an eyesore and while it did its job when we had global locks it just gets in the way now that most of the global locks are gone.
Replaced the decontention code with a fall back which acquires the tokens in sorted order, to guarantee that deadlocks will always be resolved eventually in the scheduler.
* Introduced a simplified spin-for-a-little-while function _lwkt_trytoken_spin() that the token code now uses rather than giving up immediately.
* The vfs_bio subsystem no longer uses vm_token and now uses the vm_object_hold/drop API for buffer cache operations, resulting in very good concurrency.
* Gave the vnode its own spinlock instead of sharing vp->v_lock.lk_spinlock, which fixes a deadlock.
* Adjusted all platform pamp.c's to handle the new main kernel APIs. The i386 pmap.c is still a bit out of date but should be compatible.
* Completely rewrote very large chunks of the x86-64 pmap.c code. The critical path no longer needs pmap_spin but pmap_spin itself is still used heavily, particularin the pv_entry handling code.
A per-pmap token and per-pmap object are now used to serialize pmamp access and vm_page lookup operations when needed.
The x86-64 pmap.c code now uses only vm_page->crit_count instead of both crit_count and hold_count, which fixes races against other parts of the kernel uses vm_page_hold().
_pmap_allocpte() mechanics have been completely rewritten to remove potential races. Much of pmap_enter() and pmap_enter_quick() has also been rewritten.
Many other changes.
* The following subsystems (and probably more) no longer use the vm_token or vmobj_token in critical paths:
x The swap_pager now uses the vm_object_hold/drop API instead of vm_token.
x mmap() and vm_map/vm_mmap in general now use the vm_object_hold/drop API instead of vm_token.
x vnode_pager
x zalloc
x vm_page handling
x vfs_bio
x umtx system calls
x vm_fault and friends
* Minor fixes to fill_kinfo_proc() to deal with process scan panics (ps) revealed by recent global lock removals.
* lockmgr() locks no longer support LK_NOSPINWAIT. Spin locks are unconditionally acquired.
* Replaced netif/e1000's spinlocks with lockmgr locks. The spinlocks were not appropriate owing to the large context they were covering.
* Misc atomic ops added
show more ...
|
Revision tags: v2.12.0, v2.13.0, v2.10.1, v2.11.0, v2.10.0, v2.9.1, v2.8.2, v2.8.1, v2.8.0, v2.9.0, v2.6.3, v2.7.3, v2.6.2, v2.7.2, v2.7.1, v2.6.1, v2.7.0, v2.6.0, v2.5.1, v2.4.1, v2.5.0, v2.4.0, v2.3.2, v2.3.1, v2.2.1, v2.2.0, v2.3.0 |
|
#
895c1f85 |
| 15-Dec-2008 |
Michael Neumann <mneumann@ntecs.de> |
suser_* to priv_* conversion
|