History log of /dragonfly/sys/kern/sys_process.c (Results 1 – 25 of 65)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 2b3f93ea 13-Oct-2023 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add per-process capability-based restrictions

* This new system allows userland to set capability restrictions which
turns off numerous kernel features and root accesses. These restricti

kernel - Add per-process capability-based restrictions

* This new system allows userland to set capability restrictions which
turns off numerous kernel features and root accesses. These restrictions
are inherited by sub-processes recursively. Once set, restrictions cannot
be removed.

Basic restrictions that mimic an unadorned jail can be enabled without
creating a jail, but generally speaking real security also requires
creating a chrooted filesystem topology, and a jail is still needed
to really segregate processes from each other. If you do so, however,
you can (for example) disable mount/umount and most global root-only
features.

* Add new system calls and a manual page for syscap_get(2) and syscap_set(2)

* Add sys/caps.h

* Add the "setcaps" userland utility and manual page.

* Remove priv.9 and the priv_check infrastructure, replacing it with
a newly designed caps infrastructure.

* The intention is to add path restriction lists and similar features to
improve jailess security in the near future, and to optimize the
priv_check code.

show more ...


Revision tags: v6.4.0, v6.4.0rc1, v6.5.0, v6.2.2, v6.2.1, v6.2.0, v6.3.0, v6.0.1
# 949c56f8 23-Jul-2021 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Rename vm_map_wire() and vm_map_unwire()

* These names are mutant throwbacks to an earlier age and no
longer mean what is implied.

* Rename vm_map_wire() to vm_map_kernel_wiring(). This

kernel - Rename vm_map_wire() and vm_map_unwire()

* These names are mutant throwbacks to an earlier age and no
longer mean what is implied.

* Rename vm_map_wire() to vm_map_kernel_wiring(). This function can
wire and unwire VM ranges in a vm_map under kernel control. Userland
has no say.

* Rename vm_map_unwire() to vm_map_user_wiring(). This function can
wire and unwire VM ranges in a vm_map under user control. Userland
can adjust the user wiring state for pages.

show more ...


# 1eeaf6b2 20-May-2021 Aaron LI <aly@aaronly.me>

vm: Change 'kernel_map' global to type of 'struct vm_map *'

Change the global variable 'kernel_map' from type 'struct vm_map' to a
pointer to this struct. This simplify the code a bit since all
inv

vm: Change 'kernel_map' global to type of 'struct vm_map *'

Change the global variable 'kernel_map' from type 'struct vm_map' to a
pointer to this struct. This simplify the code a bit since all
invocations take its address. This change also aligns with NetBSD's
'kernal_map' that it's also a pointer, which also helps the porting of
NVMM.

No functional changes.

show more ...


Revision tags: v6.0.0, v6.0.0rc1, v6.1.0
# acdf1ee6 15-Nov-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add PROC_PDEATHSIG_CTL and PROC_PDEATHSIG_STATUS

* Add PROC_PDEATHSIG_CTL and PROC_PDEATHSIG_STATUS to procctl(2).

This follows the linux and freebsd semantics, however it should be note

kernel - Add PROC_PDEATHSIG_CTL and PROC_PDEATHSIG_STATUS

* Add PROC_PDEATHSIG_CTL and PROC_PDEATHSIG_STATUS to procctl(2).

This follows the linux and freebsd semantics, however it should be noted
that since the child of a fork() clears the setting, these semantics have
a fork/exit race between an exiting parent and a child which has not
yet setup its death wish.

* Also fix a number of signal ranging checks.

Requested-by: zrj

show more ...


Revision tags: v5.8.3, v5.8.2
# 80d831e1 25-Jul-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Refactor in-kernel system call API to remove bcopy()

* Change the in-kernel system call prototype to take the
system call arguments as a separate pointer, and make the
contents read-onl

kernel - Refactor in-kernel system call API to remove bcopy()

* Change the in-kernel system call prototype to take the
system call arguments as a separate pointer, and make the
contents read-only.

int sy_call_t (void *);
int sy_call_t (struct sysmsg *sysmsg, const void *);

* System calls with 6 arguments or less no longer need to copy
the arguments from the trapframe to a holding structure. Instead,
we simply point into the trapframe.

The L1 cache footprint will be a bit smaller, but in simple tests
the results are not noticably faster... maybe 1ns or so
(roughly 1%).

show more ...


Revision tags: v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1
# 01251219 14-Feb-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Start work on a better burst page-fault mechanic

* The vm.fault_quick sysctl is now a burst count. It still
defaults to 1 which is the same operation as before.

Performance is roughly

kernel - Start work on a better burst page-fault mechanic

* The vm.fault_quick sysctl is now a burst count. It still
defaults to 1 which is the same operation as before.

Performance is roughly the same with it set to 1 to 8 as
more work needs to be done to optimize pmap_enter().

show more ...


Revision tags: v5.6.3
# 8b411d28 12-Nov-2019 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix first-lwp access race vs process creation

* It is possible for a process to be looked up before its primary
lwp is installed. Make sure this doesn't crash the kernel.


# 13dd34d8 18-Oct-2019 zrj <rimvydas.jasinskas@gmail.com>

kernel: Cleanup <sys/uio.h> issues.

The iovec_free() inline very complicates this header inclusion. The
NULL check is not always seen from <sys/_null.h>. Luckily only three
kernel sources needs

kernel: Cleanup <sys/uio.h> issues.

The iovec_free() inline very complicates this header inclusion. The
NULL check is not always seen from <sys/_null.h>. Luckily only three
kernel sources needs it: kern_subr.c, sys_generic.c and uipc_syscalls.c.
Also just a single dev/drm source makes use of 'struct uio'.
* Include <sys/uio.h> explicitly first in drm_fops.c to avoid kfree()
macro override in drm compat layer.
* Use <sys/_uio.h> where only enums and struct uio is needed, but ensure
that userland will not include it for possible later <sys/user.h> use.
* Stop using <sys/vnode.h> as shortcut for uiomove*() prototypes. The
uiomove*() family functions possibly transfer data across kernel/user
space boundary. This header presence explicitly mark sources as such.
* Prefer to add <sys/uio.h> after <sys/systm.h>, but before <sys/proc.h>
and definitely before <sys/malloc.h> (except for 3 mentioned sources).
This will allow to remove <sys/malloc.h> from <sys/uio.h> later on.
* Adjust <sys/user.h> to use component headers instead of <sys/uio.h>.

While there, use opportunity for a minimal whitespace cleanup.

No functional differences observed in compiler intermediates.

show more ...


# e63f9299 06-Oct-2019 zrj <rimvydas.jasinskas@gmail.com>

world: Eliminate custom uintfptr_t/fptrdiff_t types.

These were not used consistently and have visibility limitations,
types.h - #indef _KERNEL while profile.h - publicly. Use cases like:
"uint

world: Eliminate custom uintfptr_t/fptrdiff_t types.

These were not used consistently and have visibility limitations,
types.h - #indef _KERNEL while profile.h - publicly. Use cases like:
"uintfptr_t selfpcdiff;" and "fptrdiff_t frompci;" only confuse.
Given that underlying structs uprof, rawarc already use plain u_long
types, there are plenty (u_long) casts elsewhere in the kernel code,
follow OpenBSD and use use (u_long) casts that are clear what they do.
The unused intfptr_t type does not make much sense anyway.

show more ...


Revision tags: v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0
# 2a7bd4d8 18-May-2019 Sascha Wildner <saw@online.de>

kernel: Don't include <sys/user.h> in kernel code.

There is really no point in doing that because its main purpose is to
expose kernel structures to userland. The majority of cases wasn't
needed at

kernel: Don't include <sys/user.h> in kernel code.

There is really no point in doing that because its main purpose is to
expose kernel structures to userland. The majority of cases wasn't
needed at all and the rest required only a couple of other includes.

show more ...


Revision tags: v5.4.3
# 44293a80 09-May-2019 Matthew Dillon <dillon@apollo.backplane.com>

kernel - VM rework part 3 - Cleanup pass

* Cleanup various structures and code


# 9de48ead 09-May-2019 Matthew Dillon <dillon@apollo.backplane.com>

kernel - VM rework part 2 - Replace backing_object with backing_ba

* Remove the vm_object based backing_object chains and all related
chaining code.

This removes an enormous number of locks fro

kernel - VM rework part 2 - Replace backing_object with backing_ba

* Remove the vm_object based backing_object chains and all related
chaining code.

This removes an enormous number of locks from the VM system and
also removes object-to-object dependencies which requires careful
traversal code. A great deal of complex code has been removed
and replaced with far simpler code.

Ultimately the intention will be to support removal of pv_entry
tracking from vm_pages to gain lockless shared faults, but that
is far in the future. It will require hanging vm_map_backing
structures off of a list based in the object.

* Implement the vm_map_backing structure which is embedded in the
vm_map_entry and then links to additional dynamically allocated
vm_map_backing structures via entry->ba.backing_ba. This structure
contains the object and offset and essentially takes over the
functionality that object->backing_object used to have.

backing objects are now handled via vm_map_backing. In this
commit, fork operations create a fan-in tree to shared subsets
of backings via vm_map_backing. In this particular commit,
these subsets are not collapsed in any way.

* Remove all the vm_map_split and collapse code. Every last line
is gone. It will be reimplemented using vm_map_backing in a
later commit.

This means that as-of this commit both recursive forks and
parent-to-multiple-children forks cause an accumulation of
inefficient lists of backing objects to occur in the parent
and children. This will begin to get addressed in part 3.

* The code no longer releases the vm_map lock (typically shared)
across (get_pages) I/O. There are no longer any chaining locks to
get in the way (hopefully). This means that the code does not
have to re-check as carefully as it did before. However, some
complexity will have to be added back in once we begin to address
the accumulation of vm_map_backing structures.

* Paging performance improved by 30-40%

show more ...


Revision tags: v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1, v5.2.0, v5.3.0, v5.2.0rc, v5.0.2
# 7a45978d 09-Nov-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix bug in vm_fault_page()

* Fix a bug in vm_fault_page() and vm_fault_page_quick(). The code
is not intended to update the user pmap, but if the vm_map_lookup()
results in a COW, any

kernel - Fix bug in vm_fault_page()

* Fix a bug in vm_fault_page() and vm_fault_page_quick(). The code
is not intended to update the user pmap, but if the vm_map_lookup()
results in a COW, any existing page in the underlying pmap will no
longer match the page that should be there.

The user process will still work correctly in that it will fault the
COW'd page if/when it tries to issue a write to that address, but
userland will not have visibility to any kernel use of vm_fault_page()
that modifies the page and causes a COW if the page has already been
faulted in.

* Fixed by detecting the COW and at least removing the pte from the pmap
to force userland to re-fault it.

* This fixes gdb operation on programs. The problem did not rear its
head before because the kernel did not pre-populate as many pages in the
initial exec as it does now.

* Enhance vm_map_lookup()'s &wired argument to return wflags instead,
which includes FS_WIRED and also now has FS_DIDCOW.

Reported-by: profmakx

show more ...


Revision tags: v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc
# 3091de50 17-Dec-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Tag vm_map_entry structure, slight optimization to zalloc, misc.

* Tag the vm_map_entry structure, allowing debugging programs to
break-down how KMEM is being used more easily.

This re

kernel - Tag vm_map_entry structure, slight optimization to zalloc, misc.

* Tag the vm_map_entry structure, allowing debugging programs to
break-down how KMEM is being used more easily.

This requires an additional argument to vm_map_find() and most
kmem_alloc*() functions.

* Remove the page chunking parameter to zinit() and zinitna(). It was
only being used degeneratively. Increase the chunking from one page
to four pages, which will reduce the amount of vm_map_entry spam in
the kernel_map.

* Use atomic ops when adjusting zone_kern_pages.

show more ...


Revision tags: v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0, v4.4.3, v4.4.2, v4.4.1, v4.4.0, v4.5.0, v4.4.0rc, v4.2.4, v4.3.1, v4.2.3
# f5b92db7 10-Jul-2015 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix panic during coredump

* multi-threaded coredumps were not stopping all other threads before
attempting to scan the vm_map, resulting in numerous possible panics.

* Add a new process

kernel - Fix panic during coredump

* multi-threaded coredumps were not stopping all other threads before
attempting to scan the vm_map, resulting in numerous possible panics.

* Add a new process state, SCORE, indicating that a core dump is in progress
and adjust proc_stop() and friends as well as any code which tests the
SSTOP state. SCORE overrides SSTOP.

* The coredump code actively waits for all running threads to stop before
proceeding.

* Prevent a deadlock between a SIGKILL and core dump in progress by
temporarily counting the master exit thread as a stopped thread (which
allows the coredump to proceed and finish).

Reported-by: marino

show more ...


Revision tags: v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc, v4.0.5, v4.0.4, v4.0.3, v4.0.2, v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0
# 0adbcbd6 16-Oct-2014 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add /dev/upmap and /dev/kpmap and sys/upmap.h

* Add two memory-mappable devices for accessing a per-process and global
kernel shared memory space. These can be mapped to acquire certain

kernel - Add /dev/upmap and /dev/kpmap and sys/upmap.h

* Add two memory-mappable devices for accessing a per-process and global
kernel shared memory space. These can be mapped to acquire certain
information from the kernel that would normally require a system call
in a more efficient manner.

Userland programs using this feature should NOT directly map the sys_upmap
and sys_kpmap structures (which is why they are in #ifdef _KERNEL sections
in sys/upmap.h). Instead, mmap the devices using UPMAP_MAPSIZE and
KPMAP_MAPSIZE and parse the ukpheader[] array at the front of each area
to locate the desired fields. You can then simply cache a pointer to
the desired field.

The width of the field is encoded in the UPTYPE/KPTYPE elements and
can be asserted if desired, user programs are not expected to handle
integers of multiple sizes for the same field type.

* Add /dev/upmap. A program can open and mmap() this device R+W and use
it to access:

header[...] - See sys/upmap.h. An array of headers terminating with
a type=0 header indicating where various fields are in
the mapping. This should be used by userland instead
of directly mapping to the struct sys_upmap structure.

version - The sys_upmap version, typically 1.

runticks - Scheduler run ticks (aggregate, all threads). This
may be used by userland interpreters to determine
when to soft-switch.

forkid - A unique non-zero 64-bit fork identifier. This is NOT a
pid. This may be used by userland libraries to determine
if a fork has occurred by comparing against a stored
value.

pid - The current process pid. This may be used to acquire the
process pid without having to make further system calls.

proc_title - This starts out as an empty buffer and may be used to set
the process title. To revert to the original process title,
set proc_title[0] to 0.

NOTE! Userland may write to the entire buffer, but it is recommended
that userland only write to fields intended to be writable.

NOTE! When a program forks, an area already mmap()d remains mmap()d but
will point to the new process's area and not the old, so libraries
do not need to do anything special atfork.

NOTE! Access to this structure is cpu localized.

* Add /dev/kpmap. A program can open and mmap() this device RO and use
it to access:

header[...] - See sys/upmap.h. An array of headers terminating with
a type=0 header indicating where various fields are in
the mapping. This should be used by userland instead
of directly mapping to the struct sys_upmap structure.

version - The sys_kpmap version, typically 1.

upticks - System uptime tick counter (32 bit integer). Monotonic,
uncompensated.

ts_uptime - System uptime in struct timespec format at tick-resolution.
Monotonic, uncompensated.

ts_realtime - System realtime in struct timespec format at tick-resolution.
This is compensated so reverse-indexing is possible.

tsc_freq - If the system supports a TSC of some sort, the TSC
frequency is recorded here, else 0.

tick_freq - The tick resolution of ts_uptime and ts_realtime and
approximate tick resolution for the scheduler. Typically
100.

NOTE! Userland may only read from this buffer.

NOTE! Access to this structure is NOT cpu localized. A memory fence
and double-check should be used when accessing non-atomic structures
which might change such as ts_uptime and ts_realtime.

XXX needs work.

show more ...


Revision tags: v3.8.2, v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc, v3.6.2, v3.6.1
# 763ff625 18-Dec-2013 Nicolas Thery <nthery@gmail.com>

kernel: forbid ptrace on system processes

The scenario that triggered this change is the GDB test suite which
tries to attach to process 0 (the swapper). This dereferenced a NULL
pointer while repa

kernel: forbid ptrace on system processes

The scenario that triggered this change is the GDB test suite which
tries to attach to process 0 (the swapper). This dereferenced a NULL
pointer while reparenting the swapper to GDB as the former has no
parent.

ptrace(2) is intended for debugging user processes so prevent it
altogether on system processes as this is deadlock prone.

There were already calls to procfs for preventing accesses to registers
of system processes. Remove the now superfluous comments but leave
these calls as they may be extended someday to check for more
conditions.

Dragonfly-bug: <http://bugs.dragonflybsd.org/issue2615>

show more ...


Revision tags: v3.6.0, v3.7.1, v3.6.0rc, v3.7.0
# a8d3ab53 25-Oct-2013 Matthew Dillon <dillon@apollo.backplane.com>

kernel - proc_token removal pass stage 1/2

* Remove proc_token use from all subsystems except kern/kern_proc.c.

* The token had become mostly useless in these subsystems now that process
locking

kernel - proc_token removal pass stage 1/2

* Remove proc_token use from all subsystems except kern/kern_proc.c.

* The token had become mostly useless in these subsystems now that process
locking is more fine-grained. Do the final wipe of proc_token except for
allproc/zombproc list use in kern_proc.c

show more ...


Revision tags: v3.4.3, v3.4.2, v3.4.0, v3.4.1, v3.4.0rc, v3.5.0
# 47443c9a 26-Feb-2013 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix panic on ptrace termination

* Fix a panic in the situation where gdb is exiting and terminating
a ptrace, but the original parent prpocess of the process being
debugged no longer ex

kernel - Fix panic on ptrace termination

* Fix a panic in the situation where gdb is exiting and terminating
a ptrace, but the original parent prpocess of the process being
debugged no longer exists.

show more ...


Revision tags: v3.2.2
# 55c81f71 28-Nov-2012 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix proc_reparent() race/assertion panic

* Fix proc_reparent() race/assertion panic. p_pptr changes can race,
and the procedure had an assertion for the case. Recode the procedure
to

kernel - Fix proc_reparent() race/assertion panic

* Fix proc_reparent() race/assertion panic. p_pptr changes can race,
and the procedure had an assertion for the case. Recode the procedure
to retry on a mismatch instead of assert.

* Also move the old-parent-wakeup code into the procedure so it is
properly executed in all cases.

Reported-by: Peter Avalos

show more ...


Revision tags: v3.2.1, v3.2.0, v3.3.0, v3.0.3, v3.0.2, v3.0.1, v3.1.0, v3.0.0
# 86d7f5d3 26-Nov-2011 John Marino <draco@marino.st>

Initial import of binutils 2.22 on the new vendor branch

Future versions of binutils will also reside on this branch rather
than continuing to create new binutils branches for each new version.


# dda969a8 18-Nov-2011 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Cleanup and document

* Cleanup and document various bits of code.


# 4643740a 15-Nov-2011 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Major signal path adjustments to fix races, tsleep race fixes, +more

* Refactor the signal code to properly hold the lp->lwp_token. In
particular the ksignal() and lwp_signotify() paths.

kernel - Major signal path adjustments to fix races, tsleep race fixes, +more

* Refactor the signal code to properly hold the lp->lwp_token. In
particular the ksignal() and lwp_signotify() paths.

* The tsleep() path must also hold lp->lwp_token to properly handle
lp->lwp_stat states and interlocks.

* Refactor the timeout code in tsleep() to ensure that endtsleep() is only
called from the proper context, and fix races between endtsleep() and
lwkt_switch().

* Rename proc->p_flag to proc->p_flags

* Rename lwp->lwp_flag to lwp->lwp_flags

* Add lwp->lwp_mpflags and move flags which require atomic ops (are adjusted
when not the current thread) to the new field.

* Add td->td_mpflags and move flags which require atomic ops (are adjusted
when not the current thread) to the new field.

* Add some freeze testing code to the x86-64 trap code (default disabled).

show more ...


# b12defdc 18-Oct-2011 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Major SMP performance patch / VM system, bus-fault/seg-fault fixes

This is a very large patch which reworks locking in the entire VM subsystem,
concentrated on VM objects and the x86-64 pma

kernel - Major SMP performance patch / VM system, bus-fault/seg-fault fixes

This is a very large patch which reworks locking in the entire VM subsystem,
concentrated on VM objects and the x86-64 pmap code. These fixes remove
nearly all the spin lock contention for non-threaded VM faults and narrows
contention for threaded VM faults to just the threads sharing the pmap.

Multi-socket many-core machines will see a 30-50% improvement in parallel
build performance (tested on a 48-core opteron), depending on how well
the build parallelizes.

As part of this work a long-standing problem on 64-bit systems where programs
would occasionally seg-fault or bus-fault for no reason has been fixed. The
problem was related to races between vm_fault, the vm_object collapse code,
and the vm_map splitting code.

* Most uses of vm_token have been removed. All uses of vm_spin have been
removed. These have been replaced with per-object tokens and per-queue
(vm_page_queues[]) spin locks.

Note in particular that since we still have the page coloring code the
PQ_FREE and PQ_CACHE queues are actually many queues, individually
spin-locked, resulting in very excellent MP page allocation and freeing
performance.

* Reworked vm_page_lookup() and vm_object->rb_memq. All (object,pindex)
lookup operations are now covered by the vm_object hold/drop system,
which utilize pool tokens on vm_objects. Calls now require that the
VM object be held in order to ensure a stable outcome.

Also added vm_page_lookup_busy_wait(), vm_page_lookup_busy_try(),
vm_page_busy_wait(), vm_page_busy_try(), and other API functions
which integrate the PG_BUSY handling.

* Added OBJ_CHAINLOCK. Most vm_object operations are protected by
the vm_object_hold/drop() facility which is token-based. Certain
critical functions which must traverse backing_object chains use
a hard-locking flag and lock almost the entire chain as it is traversed
to prevent races against object deallocation, collapses, and splits.

The last object in the chain (typically a vnode) is NOT locked in
this manner, so concurrent faults which terminate at the same vnode will
still have good performance. This is important e.g. for parallel compiles
which might be running dozens of the same compiler binary concurrently.

* Created a per vm_map token and removed most uses of vmspace_token.

* Removed the mp_lock in sys_execve(). It has not been needed in a while.

* Add kmem_lim_size() which returns approximate available memory (reduced
by available KVM), in megabytes. This is now used to scale up the
slab allocator cache and the pipe buffer caches to reduce unnecessary
global kmem operations.

* Rewrote vm_page_alloc(), various bits in vm/vm_contig.c, the swapcache
scan code, and the pageout scan code. These routines were rewritten
to use the per-queue spin locks.

* Replaced the exponential backoff in the spinlock code with something
a bit less complex and cleaned it up.

* Restructured the IPIQ func/arg1/arg2 array for better cache locality.
Removed the per-queue ip_npoll and replaced it with a per-cpu gd_npoll,
which is used by other cores to determine if they need to issue an
actual hardware IPI or not. This reduces hardware IPI issuance
considerably (and the removal of the decontention code reduced it even
more).

* Temporarily removed the lwkt thread fairq code and disabled a number of
features. These will be worked back in once we track down some of the
remaining performance issues.

Temproarily removed the lwkt thread resequencer for tokens for the same
reason. This might wind up being permanent.

Added splz_check()s in a few critical places.

* Increased the number of pool tokens from 1024 to 4001 and went to a
prime-number mod algorithm to reduce overlaps.

* Removed the token decontention code. This was a bit of an eyesore and
while it did its job when we had global locks it just gets in the way now
that most of the global locks are gone.

Replaced the decontention code with a fall back which acquires the
tokens in sorted order, to guarantee that deadlocks will always be
resolved eventually in the scheduler.

* Introduced a simplified spin-for-a-little-while function
_lwkt_trytoken_spin() that the token code now uses rather than giving
up immediately.

* The vfs_bio subsystem no longer uses vm_token and now uses the
vm_object_hold/drop API for buffer cache operations, resulting
in very good concurrency.

* Gave the vnode its own spinlock instead of sharing vp->v_lock.lk_spinlock,
which fixes a deadlock.

* Adjusted all platform pamp.c's to handle the new main kernel APIs. The
i386 pmap.c is still a bit out of date but should be compatible.

* Completely rewrote very large chunks of the x86-64 pmap.c code. The
critical path no longer needs pmap_spin but pmap_spin itself is still
used heavily, particularin the pv_entry handling code.

A per-pmap token and per-pmap object are now used to serialize pmamp
access and vm_page lookup operations when needed.

The x86-64 pmap.c code now uses only vm_page->crit_count instead of
both crit_count and hold_count, which fixes races against other parts of
the kernel uses vm_page_hold().

_pmap_allocpte() mechanics have been completely rewritten to remove
potential races. Much of pmap_enter() and pmap_enter_quick() has also
been rewritten.

Many other changes.

* The following subsystems (and probably more) no longer use the vm_token
or vmobj_token in critical paths:

x The swap_pager now uses the vm_object_hold/drop API instead of vm_token.

x mmap() and vm_map/vm_mmap in general now use the vm_object_hold/drop API
instead of vm_token.

x vnode_pager

x zalloc

x vm_page handling

x vfs_bio

x umtx system calls

x vm_fault and friends

* Minor fixes to fill_kinfo_proc() to deal with process scan panics (ps)
revealed by recent global lock removals.

* lockmgr() locks no longer support LK_NOSPINWAIT. Spin locks are
unconditionally acquired.

* Replaced netif/e1000's spinlocks with lockmgr locks. The spinlocks
were not appropriate owing to the large context they were covering.

* Misc atomic ops added

show more ...


Revision tags: v2.12.0, v2.13.0, v2.10.1, v2.11.0, v2.10.0, v2.9.1, v2.8.2, v2.8.1, v2.8.0, v2.9.0, v2.6.3, v2.7.3, v2.6.2, v2.7.2, v2.7.1, v2.6.1, v2.7.0, v2.6.0, v2.5.1, v2.4.1, v2.5.0, v2.4.0, v2.3.2, v2.3.1, v2.2.1, v2.2.0, v2.3.0
# 895c1f85 15-Dec-2008 Michael Neumann <mneumann@ntecs.de>

suser_* to priv_* conversion


123