Revision tags: v4.6.1 |
|
#
726f7ca0 |
| 05-Aug-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix kern.proc.pathname sysctl
* kern.proc.pathname is a sysctl used by programs to find the path of the running program. This sysctl was created before we stored sufficient information
kernel - Fix kern.proc.pathname sysctl
* kern.proc.pathname is a sysctl used by programs to find the path of the running program. This sysctl was created before we stored sufficient information in the proc structure to construct the correct path when multiple aliases are present (due to e.g. null-mounts) to the same file.
* We do have this information, in p->p_textnch, so change the sysctl to use it. The sysctl will now return the actual full path in the context of whomever ran the program, so it should properly take into account chroots and such.
show more ...
|
Revision tags: v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0, v4.4.3, v4.4.2, v4.4.1, v4.4.0, v4.5.0, v4.4.0rc, v4.2.4, v4.3.1, v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc |
|
#
ced589cb |
| 30-May-2015 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
lwkt: Initialize LWKT objcache initialization to earlier place
So that calling lwkt_create w/o thread template could work during early boot.
LWKT is now initialized before SOFTCLOCK, since SOFTCLOC
lwkt: Initialize LWKT objcache initialization to earlier place
So that calling lwkt_create w/o thread template could work during early boot.
LWKT is now initialized before SOFTCLOCK, since SOFTCLOCK creates per-cpu callout threads, though the creation uses thread template.
show more ...
|
Revision tags: v4.0.5, v4.0.4, v4.0.3, v4.0.2, v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0 |
|
#
87116512 |
| 17-Oct-2014 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add /dev/upmap and /dev/kpmap and sys/upmap.h (3)
* Add upmap->invfork. When a vforked child is trying to access the upmap prior to exec we must still access the parent's map and not the
kernel - Add /dev/upmap and /dev/kpmap and sys/upmap.h (3)
* Add upmap->invfork. When a vforked child is trying to access the upmap prior to exec we must still access the parent's map and not the child's, which means that the stored PID will be incorrect.
To fix this issue we add the invfork field which allows userland to determine whether this is a vforked child accessing the parent's map. If it is, getpid() will use the system call.
* Fix a bug where a vfork()d child creates p->p_upmap for itself but then maps it into the parent's address space as a side effect of a getpid() or other call. When this situation is detected, /dev/upmap will use the parent's p_upmap and not the child's, and also properly set the invfork flag.
* Implement system call overrides for getpid(), setproctitle(), and clock_gettime() (*_FAST and *_SECOND clock ids). When more than 10 calls are made to one of these functions the new libc upmap/kpmap support is activated. /dev/upmap and /dev/kpmap will be memory-mapped into the address space and further accesses will run through the maps instead of making system calls.
This will obviously reduce overhead for these calls by a very significant multiplier.
* NOTE! gettimeofday() is still a system call and will likely remain a system call in order to return a fine-grained time value. Third-party code that doesn't need a fine-grained time value must use clock_gettime() to obtain the new performance efficiencies.
show more ...
|
#
0adbcbd6 |
| 16-Oct-2014 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add /dev/upmap and /dev/kpmap and sys/upmap.h
* Add two memory-mappable devices for accessing a per-process and global kernel shared memory space. These can be mapped to acquire certain
kernel - Add /dev/upmap and /dev/kpmap and sys/upmap.h
* Add two memory-mappable devices for accessing a per-process and global kernel shared memory space. These can be mapped to acquire certain information from the kernel that would normally require a system call in a more efficient manner.
Userland programs using this feature should NOT directly map the sys_upmap and sys_kpmap structures (which is why they are in #ifdef _KERNEL sections in sys/upmap.h). Instead, mmap the devices using UPMAP_MAPSIZE and KPMAP_MAPSIZE and parse the ukpheader[] array at the front of each area to locate the desired fields. You can then simply cache a pointer to the desired field.
The width of the field is encoded in the UPTYPE/KPTYPE elements and can be asserted if desired, user programs are not expected to handle integers of multiple sizes for the same field type.
* Add /dev/upmap. A program can open and mmap() this device R+W and use it to access:
header[...] - See sys/upmap.h. An array of headers terminating with a type=0 header indicating where various fields are in the mapping. This should be used by userland instead of directly mapping to the struct sys_upmap structure.
version - The sys_upmap version, typically 1.
runticks - Scheduler run ticks (aggregate, all threads). This may be used by userland interpreters to determine when to soft-switch.
forkid - A unique non-zero 64-bit fork identifier. This is NOT a pid. This may be used by userland libraries to determine if a fork has occurred by comparing against a stored value.
pid - The current process pid. This may be used to acquire the process pid without having to make further system calls.
proc_title - This starts out as an empty buffer and may be used to set the process title. To revert to the original process title, set proc_title[0] to 0.
NOTE! Userland may write to the entire buffer, but it is recommended that userland only write to fields intended to be writable.
NOTE! When a program forks, an area already mmap()d remains mmap()d but will point to the new process's area and not the old, so libraries do not need to do anything special atfork.
NOTE! Access to this structure is cpu localized.
* Add /dev/kpmap. A program can open and mmap() this device RO and use it to access:
header[...] - See sys/upmap.h. An array of headers terminating with a type=0 header indicating where various fields are in the mapping. This should be used by userland instead of directly mapping to the struct sys_upmap structure.
version - The sys_kpmap version, typically 1.
upticks - System uptime tick counter (32 bit integer). Monotonic, uncompensated.
ts_uptime - System uptime in struct timespec format at tick-resolution. Monotonic, uncompensated.
ts_realtime - System realtime in struct timespec format at tick-resolution. This is compensated so reverse-indexing is possible.
tsc_freq - If the system supports a TSC of some sort, the TSC frequency is recorded here, else 0.
tick_freq - The tick resolution of ts_uptime and ts_realtime and approximate tick resolution for the scheduler. Typically 100.
NOTE! Userland may only read from this buffer.
NOTE! Access to this structure is NOT cpu localized. A memory fence and double-check should be used when accessing non-atomic structures which might change such as ts_uptime and ts_realtime.
XXX needs work.
show more ...
|
Revision tags: v3.8.2 |
|
#
c07315c4 |
| 04-Jul-2014 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor cpumask_t to extend cpus past 64, part 1/2
* 64-bit systems only. 32-bit builds use the macros but cannot be expanded past 32 cpus.
* Change cpumask_t from __uint64_t to a stru
kernel - Refactor cpumask_t to extend cpus past 64, part 1/2
* 64-bit systems only. 32-bit builds use the macros but cannot be expanded past 32 cpus.
* Change cpumask_t from __uint64_t to a structure. This commit implements one 64-bit sub-element (the next one will implement four for 256 cpus).
* Create a CPUMASK_*() macro API for non-atomic and atomic cpumask manipulation. These macros generally take lvalues as arguments, allowing for a fairly optimal implementation.
* Change all C code operating on cpumask's to use the newly created CPUMASK_*() macro API.
* Compile-test 32 and 64-bit. Run-test 64-bit.
* Adjust sbin/usched, usr.sbin/powerd. usched currently needs more work.
show more ...
|
Revision tags: v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc |
|
#
3a877e44 |
| 21-Apr-2014 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Improve pid-reuse algorithm, fix bug
* Fix a bug where under extreme loads it was possible for a PID to be allocated twice.
* Implement a minimum pid-reuse delay of 10 seconds. No pid,
kernel - Improve pid-reuse algorithm, fix bug
* Fix a bug where under extreme loads it was possible for a PID to be allocated twice.
* Implement a minimum pid-reuse delay of 10 seconds. No pid, session id, or pgid will be reused for at least 10 seconds after being reaped.
This shouldn't really be necessary but it should help scripts, particularly bulk builds, which rely on testing out-of-band PIDs with pwait.
* Increase PID_MAX from 99999 to 999999
Reported-by: marino
show more ...
|
Revision tags: v3.6.2 |
|
#
e3790519 |
| 20-Mar-2014 |
Joris Giovannangeli <joris@giovannangeli.fr> |
kernel: check that p_ucred is not NULL before calling p_trespass
|
#
ca3546a8 |
| 14-Mar-2014 |
Antonio Huete Jimenez <tuxillo@quantumachine.net> |
kernel - Add allproc_hsize global
- Used by kvm(3) to determine the proc hash size.
|
#
35c7df0f |
| 09-Mar-2014 |
Markus Pfeiffer <markus.pfeiffer@morphism.de> |
kernel: Fixup KERN_PROC_PATHNAME sysctl
The code for the sysctl uses pfind, which PHOLDs the process if a pid is passed to the sysctl, so we PRELE the process if necessary.
|
Revision tags: v3.6.1 |
|
#
f849311b |
| 04-Jan-2014 |
François Tigeot <ftigeot@wolfpond.org> |
kernel: Add the KERN_PROC_PATHNAME sysctl
Which returns the full path of a process text file.
Obtained-from: FreeBSD
|
Revision tags: v3.6.0, v3.7.1, v3.6.0rc, v3.7.0 |
|
#
b22b6ecd |
| 25-Oct-2013 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Remove proc_token, replace proc, pgrp, and session structure backend
* Isolate the remaining exposed topology for proc, pgrp, and session into one source file (kern_proc.c).
* Remove all
kernel - Remove proc_token, replace proc, pgrp, and session structure backend
* Isolate the remaining exposed topology for proc, pgrp, and session into one source file (kern_proc.c).
* Remove allproc, zombproc, pgrp's spinlocks, and start tracking session structures so we don't have to indirect through other system structures.
* Replace with arrays-of-lists, 1024 elements, including a 1024 element token lock array to protect each list.
proc_tokens[1024] allprocs[1024] allpgrps[1024] allsessn[1024]
This removes nearly all the prior proc_token contention and also removes process-group processing contention and makes it easier to track tty sessions.
* Normal process, Zombie processes, the original linear list, and the original has mechanic are now all combined into a single allprocs[] table. The various API functions will filter out zombie vs non-zombie based on the type of request.
* Rewrite the PID allocator to take advantage of the hashed array topology. An atomic_fetchadd_int() is used on the static base value which will cause each cpu to start at a different array entry, thus removing SMP conflicts.
At the moment we iterate the relatively small number of elements in the bucket to find a free pid.
Since the same proc_tokens[n] lock applies to all three arrays (proc, pgrp, and session), we can validate the pid against all three at the same time with a single lock.
* Rewrite the procs sysctl to iterate the hash table. Since there are 1024 different locks, a 'ps' or similar operation no longer has any significant effect on system performance, and 'ps' is VERY fast now regardless of the load.
* poudriere bulk build tests on a blade (4 core / 8 thread) shows virtually no SMP collisions even under extreme loads.
* poudriere bulk build tests on monster (48-core opteron) show very low SMP collision statistics outside of filesystem writes in most situations. Pipes (which are already fine-grained) sometimes show significant collisions.
Most importantly, NO collisions on the process fork/exec/exit critical path, end-to-end. Not even in the VM system.
show more ...
|
#
a8d3ab53 |
| 25-Oct-2013 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - proc_token removal pass stage 1/2
* Remove proc_token use from all subsystems except kern/kern_proc.c.
* The token had become mostly useless in these subsystems now that process locking
kernel - proc_token removal pass stage 1/2
* Remove proc_token use from all subsystems except kern/kern_proc.c.
* The token had become mostly useless in these subsystems now that process locking is more fine-grained. Do the final wipe of proc_token except for allproc/zombproc list use in kern_proc.c
show more ...
|
#
51818c08 |
| 24-Oct-2013 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Improve vfork/exec and wait*() performance
* Use a flags interlock instead of a token interlock for PPWAIT for vfork/exec (handling when the parent must wait for the child to finish exe
kernel - Improve vfork/exec and wait*() performance
* Use a flags interlock instead of a token interlock for PPWAIT for vfork/exec (handling when the parent must wait for the child to finish exec'ing).
* The exit1() code must wakeup the parent's wait*()'s. Delay the wakeup until after the token has been released.
* Change thet interlock in the parent's wait*() code to use a generation counter.
* Do not wakeup p_nthreads on exit if the program was never multi-threaded, saving a few cycles.
show more ...
|
#
fa3d6eac |
| 23-Oct-2013 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - proc_token performance cleanups
* pfind()/pfindn()/zpfind() now acquire proc_token shared.
* Fix a bug in alllwp_scan(). Must hold p->p_token while scanning its lwp's.
* Process list s
kernel - proc_token performance cleanups
* pfind()/pfindn()/zpfind() now acquire proc_token shared.
* Fix a bug in alllwp_scan(). Must hold p->p_token while scanning its lwp's.
* Process list scan can use a shared token, use pfind() instead of pfindn() and remove proc_token for individual pid lookups.
* cwd can use a shared p->p_token.
* getgroups(), seteuid(), and numerous other uid/gid access and setting functions need to use p->p_token, not proc_token (Repored by enjolras).
show more ...
|
#
849425f7 |
| 16-Oct-2013 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix panic in sysctl_kern_proc()
* sysctl_kern_proc() loops through ncpus and moves the thread to each cpu in turn in order to access its local thread list.
* Fix a panic where the functi
kernel - Fix panic in sysctl_kern_proc()
* sysctl_kern_proc() loops through ncpus and moves the thread to each cpu in turn in order to access its local thread list.
* Fix a panic where the function does not return on the same cpu it was called on. The userland scheduler expects threads to return to usermode on the same cpu they left usermode on and is responsible for moving the thread to another cpu (for userland scheduling purposes) itself.
show more ...
|
#
e6a0f74a |
| 09-Oct-2013 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix pgrp and session ref-count races
* Fix some tight timing windows where the ref count on these structures could race.
* Protect the pgrp hash table with a spinlock instead of using pr
kernel - Fix pgrp and session ref-count races
* Fix some tight timing windows where the ref count on these structures could race.
* Protect the pgrp hash table with a spinlock instead of using proc_token.
* Improve pgfind() performance by using the spinlock in shared mode.
* Do not transition p_pgrp through NULL when changing a process's pgrp. Atomically transition the process (protected p->p_token and pg->pg_token).
show more ...
|
#
a86ce0cd |
| 20-Sep-2013 |
Matthew Dillon <dillon@apollo.backplane.com> |
hammer2 - Merge Mihai Carabas's VKERNEL/VMM GSOC project into the main tree
* This merge contains work primarily by Mihai Carabas, with some misc fixes also by Matthew Dillon.
* Special note on G
hammer2 - Merge Mihai Carabas's VKERNEL/VMM GSOC project into the main tree
* This merge contains work primarily by Mihai Carabas, with some misc fixes also by Matthew Dillon.
* Special note on GSOC core
This is, needless to say, a huge amount of work compressed down into a few paragraphs of comments. Adds the pc64/vmm subdirectory and tons of stuff to support hardware virtualization in guest-user mode, plus the ability for programs (vkernels) running in this mode to make normal system calls to the host.
* Add system call infrastructure for VMM mode operations in kern/sys_vmm.c which vectors through a structure to machine-specific implementations.
vmm_guest_ctl_args() vmm_guest_sync_addr_args()
vmm_guest_ctl_args() - bootstrap VMM and EPT modes. Copydown the original user stack for EPT (since EPT 'physical' addresses cannot reach that far into the backing store represented by the process's original VM space). Also installs the GUEST_CR3 for the guest using parameters supplied by the guest.
vmm_guest_sync_addr_args() - A host helper function that the vkernel can use to invalidate page tables on multiple real cpus. This is a lot more efficient than having the vkernel try to do it itself with IPI signals via cpusync*().
* Add Intel VMX support to the host infrastructure. Again, tons of work compressed down into a one paragraph commit message. Intel VMX support added. AMD SVM support is not part of this GSOC and not yet supported by DragonFly.
* Remove PG_* defines for PTE's and related mmu operations. Replace with a table lookup so the same pmap code can be used for normal page tables and also EPT tables.
* Also include X86_PG_V defines specific to normal page tables for a few situations outside the pmap code.
* Adjust DDB to disassemble SVM related (intel) instructions.
* Add infrastructure to exit1() to deal related structures.
* Optimize pfind() and pfindn() to remove the global token when looking up the current process's PID (Matt)
* Add support for EPT (double layer page tables). This primarily required adjusting the pmap code to use a table lookup to get the PG_* bits.
Add an indirect vector for copyin, copyout, and other user address space copy operations to support manual walks when EPT is in use.
A multitude of system calls which manually looked up user addresses via the vm_map now need a VMM layer call to translate EPT.
* Remove the MP lock from trapsignal() use cases in trap().
* (Matt) Add pthread_yield()s in most spin loops to help situations where the vkernel is running on more cpu's than the host has, and to help with scheduler edge cases on the host.
* (Matt) Add a pmap_fault_page_quick() infrastructure that vm_fault_page() uses to try to shortcut operations and avoid locks. Implement it for pc64. This function checks whether the page is already faulted in as requested by looking up the PTE. If not it returns NULL and the full blown vm_fault_page() code continues running.
* (Matt) Remove the MP lock from most the vkernel's trap() code
* (Matt) Use a shared spinlock when possible for certain critical paths related to the copyin/copyout path.
show more ...
|
#
8cee56f4 |
| 14-Sep-2013 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Misc adjustments used by the vkernel and VMM, misc optimizations
* This section committed separately because it is basically independent of VMM.
* Improve pfind(). Don't get proc_token
kernel - Misc adjustments used by the vkernel and VMM, misc optimizations
* This section committed separately because it is basically independent of VMM.
* Improve pfind(). Don't get proc_token if the process being looked up is the current process.
* Improve kern_kill(). Do not obtain proc_token any more. p->p_token is sufficient and the process group has its own lock now.
* Call pthread_yield() when spinning on various things.
x Spinlocks x Tokens (spinning in lwkt_switch) x cpusync (ipiq)
* Rewrite sched_yield() -> dfly_yield(). dfly_yield() will unconditionally round-robin the LWP, ignoring estcpu. It isn't perfect but it works fairly well.
The dfly scheduler will also no longer attempt to migrate threads across cpus when handling yields. They migrate normally in all other circumstances.
This fixes situations where the vkernel is spinning waiting for multiple events from other cpus and in particular when it is doing a global IPI for pmap synchronization of the kernel_pmap.
show more ...
|
Revision tags: v3.4.3 |
|
#
dc71b7ab |
| 31-May-2013 |
Justin C. Sherrill <justin@shiningsilence.com> |
Correct BSD License clause numbering from 1-2-4 to 1-2-3.
Apparently everyone's doing it: http://svnweb.freebsd.org/base?view=revision&revision=251069
Submitted-by: "Eitan Adler" <lists at eitanadl
Correct BSD License clause numbering from 1-2-4 to 1-2-3.
Apparently everyone's doing it: http://svnweb.freebsd.org/base?view=revision&revision=251069
Submitted-by: "Eitan Adler" <lists at eitanadler.com>
show more ...
|
Revision tags: v3.4.2 |
|
#
2702099d |
| 06-May-2013 |
Justin C. Sherrill <justin@shiningsilence.com> |
Remove advertising clause from all that isn't contrib or userland bin.
By: Eitan Adler <lists@eitanadler.com>
|
Revision tags: v3.4.0, v3.4.1, v3.4.0rc, v3.5.0, v3.2.2 |
|
#
9072066a |
| 08-Dec-2012 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Adjust NFS server for new allocvnode() code
* Adjust the NFS server to check for LWP_MP_VNLRU garbage collection requests and act on them.
This prevents excessive allocation of vnodes
kernel - Adjust NFS server for new allocvnode() code
* Adjust the NFS server to check for LWP_MP_VNLRU garbage collection requests and act on them.
This prevents excessive allocation of vnodes by the nfsd's.
show more ...
|
#
62ae46c9 |
| 08-Dec-2012 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Change allocvnode() to not recursively block freeing vnodes
allocvnode() has caused many deadlock issues over the years, including recent issues with softupdates, because it is often called
kernel - Change allocvnode() to not recursively block freeing vnodes
allocvnode() has caused many deadlock issues over the years, including recent issues with softupdates, because it is often called from deep within VFS modules and attempts to clean and free unrelated vnodes when the vnode limit is reached to make room for the new one.
* numvnodes is not protected by any locks and needs atomic ops.
* Change allocvnode() to always allocate and not attempt to free other vnodes.
* allocvnode() now flags the LWP to handle reducing the number of vnodes in the system as of when it returns to userland instead. Consolidate several flags into a single conditional function call, lwpuserret().
When triggered, this code will do a limited scan of the free list to try to find vnodes to free.
* The vnlru_proc_wait() code existed to handle a separate algorithm related to vnodes with cached buffers and VM pages but represented a major bottleneck in the system.
Remove vnlru_proc_wait() and allow vnodes with buffers and/or non-empty VM objects to be placed on the free list.
This also requires not vhold()ing the vnode for related buffer cache buffer since the vnode will not go away until related buffers have been cleaned out. We shouldn't need those holds.
Testing-by: vsrinivas
show more ...
|
#
8d67cbb3 |
| 06-Nov-2012 |
Sascha Wildner <saw@online.de> |
Fix some typos in user visible messages, etc.
|
Revision tags: v3.2.1 |
|
#
08a7d6d8 |
| 10-Oct-2012 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Adjust cache_fullpath() API
* Add another argument to explicitly specify the base directory that the path is to be relative to.
|
Revision tags: v3.2.0, v3.3.0, v3.0.3 |
|
#
0730ed66 |
| 16-Aug-2012 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix exit races which can lead to a corrupt p_children list
* There are a few races when getting multiple tokens where a threaded process is wait*()ing for exiting children from multiple t
kernel - Fix exit races which can lead to a corrupt p_children list
* There are a few races when getting multiple tokens where a threaded process is wait*()ing for exiting children from multiple threads at once.
Fix the problem by serializing the operation on a per-child basis, and by using PHOLD/PRELE prior to acquiring the child's p_token. Then re-check the conditions before accepting the child.
* There is a small chance this will also reduce or fix VM on-exit races in i386, as this bug could result in an already-destroyed process being pulled off by the racing wait*(). Maybe 25% chance.
show more ...
|