#
2b3f93ea |
| 13-Oct-2023 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add per-process capability-based restrictions
* This new system allows userland to set capability restrictions which turns off numerous kernel features and root accesses. These restricti
kernel - Add per-process capability-based restrictions
* This new system allows userland to set capability restrictions which turns off numerous kernel features and root accesses. These restrictions are inherited by sub-processes recursively. Once set, restrictions cannot be removed.
Basic restrictions that mimic an unadorned jail can be enabled without creating a jail, but generally speaking real security also requires creating a chrooted filesystem topology, and a jail is still needed to really segregate processes from each other. If you do so, however, you can (for example) disable mount/umount and most global root-only features.
* Add new system calls and a manual page for syscap_get(2) and syscap_set(2)
* Add sys/caps.h
* Add the "setcaps" userland utility and manual page.
* Remove priv.9 and the priv_check infrastructure, replacing it with a newly designed caps infrastructure.
* The intention is to add path restriction lists and similar features to improve jailess security in the near future, and to optimize the priv_check code.
show more ...
|
Revision tags: v6.4.0, v6.4.0rc1, v6.5.0, v6.2.2, v6.2.1, v6.2.0, v6.3.0, v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2 |
|
#
ef866ef7 |
| 25-Jul-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Remove P_SWAPPEDOUT flag and paging mode
* This code basically no longer functions in any worthwhile or useful manner, remove it.
The code harkens back to a time when machines had very
kernel - Remove P_SWAPPEDOUT flag and paging mode
* This code basically no longer functions in any worthwhile or useful manner, remove it.
The code harkens back to a time when machines had very little memory and had to time-share processes by actually descheduling them for long periods of time (like 20 seconds) and paging out the related memory.
In modern times the chooser algorithm just doesn't work well because we can no longer assume that programs with large memory footprints can be demoted.
* In modern times machines have sufficient memory to rely almost entirely on the VM fault and pageout scan. The latencies caused by fault-ins are usually sufficient to demote paging-intensive processes while allowing the machine to continue to function.
If functionality need to be added back in, it can be added back in on the fault path and not here.
show more ...
|
Revision tags: v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3 |
|
#
e7126f0a |
| 12-Nov-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel and libc - Reimplement lwp_setname*() using /dev/lpmap
* Generally speaking we are implementing the features necessary to allow per-thread titling set via pthread_set_name_np() to show up
kernel and libc - Reimplement lwp_setname*() using /dev/lpmap
* Generally speaking we are implementing the features necessary to allow per-thread titling set via pthread_set_name_np() to show up in 'ps' output, and to use lpmap to make it fast.
* The lwp_setname() system call now stores the title in lpmap->thread_title[].
* Implement a libc fast-path for lwp_setname() using lpmap. If called more than 10 times, libc will use lpmap for any further calls, which omits the need to make any system calls.
* setproctitle() now stores the title in upmap->proc_title[] instead of replacing proc->p_args. proc->p_args is now no longer modified from its original contents.
* The kernel now includes lpmap->thread_title[] in the following priority order when retrieving the process command line:
lpmap->thread_title[] User-supplied thread title, if not empty upmap->proc_title[] User-supplied process title, if not empty proc->p_args Original process arguments (no longer modified)
* Put the TID in /dev/lpmap for convenient access
* Enhance the KERN_PROC_ARGS sysctl to allow the TID to be specified. The sysctl now accepts { KERN_PROC, KERN_PROC_ARGS, pid, tid } in addition to the existing { KERN_PROC, KERN_PROC_ARGS, pid } mechanism.
Enhance libkvm to use the new feature. libkvm will fall-back to the old version if necessary.
show more ...
|
#
13dd34d8 |
| 18-Oct-2019 |
zrj <rimvydas.jasinskas@gmail.com> |
kernel: Cleanup <sys/uio.h> issues.
The iovec_free() inline very complicates this header inclusion. The NULL check is not always seen from <sys/_null.h>. Luckily only three kernel sources needs
kernel: Cleanup <sys/uio.h> issues.
The iovec_free() inline very complicates this header inclusion. The NULL check is not always seen from <sys/_null.h>. Luckily only three kernel sources needs it: kern_subr.c, sys_generic.c and uipc_syscalls.c. Also just a single dev/drm source makes use of 'struct uio'. * Include <sys/uio.h> explicitly first in drm_fops.c to avoid kfree() macro override in drm compat layer. * Use <sys/_uio.h> where only enums and struct uio is needed, but ensure that userland will not include it for possible later <sys/user.h> use. * Stop using <sys/vnode.h> as shortcut for uiomove*() prototypes. The uiomove*() family functions possibly transfer data across kernel/user space boundary. This header presence explicitly mark sources as such. * Prefer to add <sys/uio.h> after <sys/systm.h>, but before <sys/proc.h> and definitely before <sys/malloc.h> (except for 3 mentioned sources). This will allow to remove <sys/malloc.h> from <sys/uio.h> later on. * Adjust <sys/user.h> to use component headers instead of <sys/uio.h>.
While there, use opportunity for a minimal whitespace cleanup.
No functional differences observed in compiler intermediates.
show more ...
|
Revision tags: v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1, v5.2.0, v5.3.0, v5.2.0rc, v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc, v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0, v4.4.3, v4.4.2, v4.4.1, v4.4.0, v4.5.0, v4.4.0rc, v4.2.4, v4.3.1, v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc, v4.0.5, v4.0.4, v4.0.3, v4.0.2, v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0 |
|
#
0adbcbd6 |
| 16-Oct-2014 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add /dev/upmap and /dev/kpmap and sys/upmap.h
* Add two memory-mappable devices for accessing a per-process and global kernel shared memory space. These can be mapped to acquire certain
kernel - Add /dev/upmap and /dev/kpmap and sys/upmap.h
* Add two memory-mappable devices for accessing a per-process and global kernel shared memory space. These can be mapped to acquire certain information from the kernel that would normally require a system call in a more efficient manner.
Userland programs using this feature should NOT directly map the sys_upmap and sys_kpmap structures (which is why they are in #ifdef _KERNEL sections in sys/upmap.h). Instead, mmap the devices using UPMAP_MAPSIZE and KPMAP_MAPSIZE and parse the ukpheader[] array at the front of each area to locate the desired fields. You can then simply cache a pointer to the desired field.
The width of the field is encoded in the UPTYPE/KPTYPE elements and can be asserted if desired, user programs are not expected to handle integers of multiple sizes for the same field type.
* Add /dev/upmap. A program can open and mmap() this device R+W and use it to access:
header[...] - See sys/upmap.h. An array of headers terminating with a type=0 header indicating where various fields are in the mapping. This should be used by userland instead of directly mapping to the struct sys_upmap structure.
version - The sys_upmap version, typically 1.
runticks - Scheduler run ticks (aggregate, all threads). This may be used by userland interpreters to determine when to soft-switch.
forkid - A unique non-zero 64-bit fork identifier. This is NOT a pid. This may be used by userland libraries to determine if a fork has occurred by comparing against a stored value.
pid - The current process pid. This may be used to acquire the process pid without having to make further system calls.
proc_title - This starts out as an empty buffer and may be used to set the process title. To revert to the original process title, set proc_title[0] to 0.
NOTE! Userland may write to the entire buffer, but it is recommended that userland only write to fields intended to be writable.
NOTE! When a program forks, an area already mmap()d remains mmap()d but will point to the new process's area and not the old, so libraries do not need to do anything special atfork.
NOTE! Access to this structure is cpu localized.
* Add /dev/kpmap. A program can open and mmap() this device RO and use it to access:
header[...] - See sys/upmap.h. An array of headers terminating with a type=0 header indicating where various fields are in the mapping. This should be used by userland instead of directly mapping to the struct sys_upmap structure.
version - The sys_kpmap version, typically 1.
upticks - System uptime tick counter (32 bit integer). Monotonic, uncompensated.
ts_uptime - System uptime in struct timespec format at tick-resolution. Monotonic, uncompensated.
ts_realtime - System realtime in struct timespec format at tick-resolution. This is compensated so reverse-indexing is possible.
tsc_freq - If the system supports a TSC of some sort, the TSC frequency is recorded here, else 0.
tick_freq - The tick resolution of ts_uptime and ts_realtime and approximate tick resolution for the scheduler. Typically 100.
NOTE! Userland may only read from this buffer.
NOTE! Access to this structure is NOT cpu localized. A memory fence and double-check should be used when accessing non-atomic structures which might change such as ts_uptime and ts_realtime.
XXX needs work.
show more ...
|
Revision tags: v3.8.2, v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc, v3.6.2, v3.6.1, v3.6.0, v3.7.1, v3.6.0rc, v3.7.0, v3.4.3 |
|
#
dc71b7ab |
| 31-May-2013 |
Justin C. Sherrill <justin@shiningsilence.com> |
Correct BSD License clause numbering from 1-2-4 to 1-2-3.
Apparently everyone's doing it: http://svnweb.freebsd.org/base?view=revision&revision=251069
Submitted-by: "Eitan Adler" <lists at eitanadl
Correct BSD License clause numbering from 1-2-4 to 1-2-3.
Apparently everyone's doing it: http://svnweb.freebsd.org/base?view=revision&revision=251069
Submitted-by: "Eitan Adler" <lists at eitanadler.com>
show more ...
|
Revision tags: v3.4.2 |
|
#
2702099d |
| 06-May-2013 |
Justin C. Sherrill <justin@shiningsilence.com> |
Remove advertising clause from all that isn't contrib or userland bin.
By: Eitan Adler <lists@eitanadler.com>
|
Revision tags: v3.4.0, v3.4.1, v3.4.0rc, v3.5.0, v3.2.2, v3.2.1, v3.2.0, v3.3.0, v3.0.3, v3.0.2, v3.0.1, v3.1.0, v3.0.0 |
|
#
4090d6ff |
| 03-Jan-2012 |
Sascha Wildner <saw@online.de> |
kernel: Use NULL for pointers.
|
#
884717e1 |
| 06-Dec-2011 |
Sascha Wildner <saw@online.de> |
kernel: Replace all usage of MALLOC()/FREE() with kmalloc()/kfree().
|
#
86d7f5d3 |
| 26-Nov-2011 |
John Marino <draco@marino.st> |
Initial import of binutils 2.22 on the new vendor branch
Future versions of binutils will also reside on this branch rather than continuing to create new binutils branches for each new version.
|
#
47538602 |
| 17-Nov-2011 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - more procfs work
* uiomove_frombuf() takes care of indexing uio_offset and checking its range for us so we don't have to do it ourselves, clean up use cases in procfs.
* Generate somew
kernel - more procfs work
* uiomove_frombuf() takes care of indexing uio_offset and checking its range for us so we don't have to do it ourselves, clean up use cases in procfs.
* Generate somewhat more consistent text output for /proc/<pid>/map by formatting the map entry range with static widths.
* ps_nargvstr is a signed number, do a better range check on it.
show more ...
|
#
4643740a |
| 15-Nov-2011 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Major signal path adjustments to fix races, tsleep race fixes, +more
* Refactor the signal code to properly hold the lp->lwp_token. In particular the ksignal() and lwp_signotify() paths.
kernel - Major signal path adjustments to fix races, tsleep race fixes, +more
* Refactor the signal code to properly hold the lp->lwp_token. In particular the ksignal() and lwp_signotify() paths.
* The tsleep() path must also hold lp->lwp_token to properly handle lp->lwp_stat states and interlocks.
* Refactor the timeout code in tsleep() to ensure that endtsleep() is only called from the proper context, and fix races between endtsleep() and lwkt_switch().
* Rename proc->p_flag to proc->p_flags
* Rename lwp->lwp_flag to lwp->lwp_flags
* Add lwp->lwp_mpflags and move flags which require atomic ops (are adjusted when not the current thread) to the new field.
* Add td->td_mpflags and move flags which require atomic ops (are adjusted when not the current thread) to the new field.
* Add some freeze testing code to the x86-64 trap code (default disabled).
show more ...
|
Revision tags: v2.12.0, v2.13.0, v2.10.1, v2.11.0, v2.10.0, v2.9.1, v2.8.2, v2.8.1, v2.8.0, v2.9.0, v2.6.3, v2.7.3, v2.6.2, v2.7.2, v2.7.1, v2.6.1, v2.7.0, v2.6.0, v2.5.1, v2.4.1, v2.5.0, v2.4.0 |
|
#
e54488bb |
| 19-Aug-2009 |
Matthew Dillon <dillon@apollo.backplane.com> |
AMD64 - Refactor uio_resid and size_t assumptions.
* uio_resid changed from int to size_t (size_t == unsigned long equivalent).
* size_t assumptions in most kernel code has been refactored to opera
AMD64 - Refactor uio_resid and size_t assumptions.
* uio_resid changed from int to size_t (size_t == unsigned long equivalent).
* size_t assumptions in most kernel code has been refactored to operate in a 64 bit environment.
* In addition, the 2G limitation for VM related system calls such as mmap() has been removed in 32 bit environments. Note however that because read() and write() return ssize_t, these functions are still limited to a 2G byte count in 32 bit environments.
show more ...
|
Revision tags: v2.3.2, v2.3.1, v2.2.1, v2.2.0, v2.3.0 |
|
#
75bda2d9 |
| 15-Dec-2008 |
Michael Neumann <mneumann@ntecs.de> |
Fix missing includes
|
Revision tags: v2.1.1, v2.0.1 |
|
#
c7e98b2f |
| 19-Feb-2007 |
Simon Schubert <corecode@dragonflybsd.org> |
1:1 Userland threading stage 2.18/4:
Push lwp use a bit further by making some places lwp aware. This commit deals with ddb, procfs/ptrace and various consumers of allproc_scan.
|
#
08f2f1bb |
| 03-Feb-2007 |
Simon Schubert <corecode@dragonflybsd.org> |
1:1 Userland threading stage 2.11/4:
Move signals into lwps, take p_lwp out of proc.
Originally-Submitted-by: David Xu <davidxu@freebsd.org> Reviewed-by: Thomas E. Spanjaard <tgen@netphreax.net>
|
#
fde7ac71 |
| 01-Jan-2007 |
Simon Schubert <corecode@dragonflybsd.org> |
1:1 Userland threading stage 2.10/4:
Separate p_stats into p_ru and lwp_ru.
proc.p_ru keeps track of all statistics directly related to a proc. This consists of RSS usage and nswap information and
1:1 Userland threading stage 2.10/4:
Separate p_stats into p_ru and lwp_ru.
proc.p_ru keeps track of all statistics directly related to a proc. This consists of RSS usage and nswap information and aggregate numbers for all former lwps of this proc.
proc.p_cru is the sum of all stats of reaped children.
lwp.lwp_ru contains the stats directly related to one specific lwp, meaning packet, scheduler switch or page fault counts, etc. This information gets added to lwp.lwp_proc.p_ru when the lwp exits.
show more ...
|
#
f8c7a42d |
| 20-Dec-2006 |
Matthew Dillon <dillon@dragonflybsd.org> |
Rename sprintf -> ksprintf Rename snprintf -> knsprintf
Make allowances for source files that are compiled for both userland and the kernel.
|
#
344ad853 |
| 14-Nov-2005 |
Matthew Dillon <dillon@dragonflybsd.org> |
Make tsleep/wakeup() MP SAFE for kernel threads and get us closer to making it MP SAFE for user processes. Currently the code is operating under the rule that access to a thread structure requires c
Make tsleep/wakeup() MP SAFE for kernel threads and get us closer to making it MP SAFE for user processes. Currently the code is operating under the rule that access to a thread structure requires cpu locality of reference, and access to a proc structure requires the Big Giant Lock. The two are not mutually exclusive so, for example, tsleep/wakeup on a proc needs both cpu locality of reference *AND* the BGL. This was true with the old tsleep/wakeup and has now been documented.
The new tsleep/wakeup algorithm is quite simple in concept. Each cpu has its own ident based hash table and each hash slot has a cpu mask which tells wakeup() which cpu's might have the ident. A wakeup iterates through all candidate cpus simply by chaining the IPI message through them until either all candidate cpus have been serviced, or (with wakeup_one()) the requested number of threads have been woken up.
Other changes made in this patch set:
* The sense of P_INMEM has been reversed. It is now P_SWAPPEDOUT. Also, P_SWAPPING, P_SWAPINREQ are not longer relevant and have been removed.
* The swapping code has been cleaned up and seriously revamped. The new swapin code staggers swapins to give the VM system a chance to respond to new conditions. Also some lwp-related fixes were made (more p_rtprio vs lwp_rtprio confusion).
* As mentioned above, tsleep/wakeup have been rewritten. The process p_stat no longer does crazy transitions from SSLEEP to SSTOP. There is now only SSLEEP and SSTOP is synthesized from P_SWAPPEDOUT for userland consumpion. Additionally, tsleep() with PCATCH will NO LONGER STOP THE PROCESS IN THE TSLEEP CALL. Instead, the actual stop is deferred until the process tries to return to userland. This removes all remaining cases where a stopped process can hold a locked kernel resource.
* A P_BREAKTSLEEP flag has been added. This flag indicates when an event occurs that is allowed to break a tsleep with PCATCH. All the weird undocumented setrunnable() rules have been removed and replaced with a very simple algorithm based on this flag.
* Since the UAREA is no longer swapped, we no longer faultin() on PHOLD(). This also incidently fixes the 'ps' command's tendancy to try to swap all processes back into memory.
* speedup_syncer() no longer does hackish checks on proc0's tsleep channel (td_wchan).
* Userland scheduler acquisition and release has now been tightened up and KKASSERT's have been added (one of the bugs Stefan found was related to an improper lwkt_schedule() that was found by one of the new assertions). We also have added other assertions related to expected conditions.
* A serious race in pmap_release_free_page() has been corrected. We no longer couple the object generation check with a failed pmap_release_free_page() call. Instead the two conditions are checked independantly. We no longer loop when pmap_release_free_page() succeeds (it is unclear how that could ever have worked properly).
Major testing by: Stefan Krueger <skrueger@meinberlikomm.de>
show more ...
|
#
d9fa5f67 |
| 08-Oct-2005 |
Simon Schubert <corecode@dragonflybsd.org> |
1:1 Userland threading stage 2.4/4:
Introduce p_start and use it. At the moment td_start is zero for all kernel threads (no change) and processes (changed). This field will be filled in a later co
1:1 Userland threading stage 2.4/4:
Introduce p_start and use it. At the moment td_start is zero for all kernel threads (no change) and processes (changed). This field will be filled in a later commit again.
show more ...
|
#
aa0a27c8 |
| 29-Jan-2005 |
Matthew Dillon <dillon@dragonflybsd.org> |
Fix the virtual 'status' file for procfs. The wrong length was being used, returning a 0-length result every time.
Reported-by: "Simon 'corecode' Schubert" <corecode@fs.ei.tum.de>
|
#
d80771bb |
| 01-Dec-2004 |
Joerg Sonnenberger <joerg@dragonflybsd.org> |
Don't read userland pointers directly, copy them first into kernel land and verify the location.
Security-fix for CAN-2004-1066 (FreeBSD-SA-04:17.procfs).
Submitted-by: Colin Percival <colin.perciv
Don't read userland pointers directly, copy them first into kernel land and verify the location.
Security-fix for CAN-2004-1066 (FreeBSD-SA-04:17.procfs).
Submitted-by: Colin Percival <colin.percival@wadham.ox.ac.uk> Credits: Bryan Fulton, Ted Unangst, and the SWAT analysis tool Coverity, Inc.
show more ...
|
#
41f3429e |
| 20-Jun-2004 |
Hiten Pandya <hmp@dragonflybsd.org> |
Move the 'p_start' field from struct pstats (Process Statistics) into the thread structure and call it 'td_start'. The behavior of vm_fork(9) is retained, i.e., it still copies the start time from t
Move the 'p_start' field from struct pstats (Process Statistics) into the thread structure and call it 'td_start'. The behavior of vm_fork(9) is retained, i.e., it still copies the start time from the parent process just as it did before.
The 'td_start' will later be used by pure threads to indicate their start time. It has not been committed in this round because use of the microtime() function at such a early point in the boot process might be unsafe.
Note, there should be no problem in accessing the td_start field, unless the process is a Zombie; due to the way Zombies are reaped, the thread will be decoupled in kern_wait1() but the process will still be around for a while it will not be possible to access the td_start field in such scenarios. A little note about this has been added on top of struct proc in <sys/proc.h> for future reference.
This work was a collaboration of Hiten Pandya <hmp@backplane.com> and Matthew Dillon <dillon@apollo.backplane.com>
show more ...
|
#
ac424f9b |
| 02-May-2004 |
Chris Pressey <cpressey@dragonflybsd.org> |
Style(9) cleanup to src/sys/vfs, stage 15/21: procfs.
- Convert K&R-style function definitions to ANSI style.
Submitted-by: Andre Nathan <andre@digirati.com.br> Additional-reformatting-by: cpressey
|
#
25e80b06 |
| 02-Oct-2003 |
David Rhodus <drhodus@dragonflybsd.org> |
Introduce a uiomove_frombuf helper routine that handles computing and validating the offset within a given memory buffer before handing the real work off to uiomove(9).
Use uiomove_frombuf in pro
Introduce a uiomove_frombuf helper routine that handles computing and validating the offset within a given memory buffer before handing the real work off to uiomove(9).
Use uiomove_frombuf in procfs to correct several issues with integer arithmetic that could result in underflows/overflows. As a side-effect, the code is significantly simplified.
Add additional sanity checks when computing a memory allocation size in pfs_read.
Reported by: Joost Pol <joost@pine.nl> (integer underflows/overflows) Originated from: FreeBSD
show more ...
|