#
cc8e70bd |
| 08-Oct-2024 |
Sergey Zigachev <s.zi@outlook.com> |
kern - Make lseek(2) generic
* Extend fileops with fo_seek function allowing pluggable lseek(2) implementations. Part of preparation for linux DMA-BUF compat API.
* Move current vnode lseek imple
kern - Make lseek(2) generic
* Extend fileops with fo_seek function allowing pluggable lseek(2) implementations. Part of preparation for linux DMA-BUF compat API.
* Move current vnode lseek implementation into vnode and devfs fileops. Code is exactly the same in both, note about duplication added.
* Set remaining fileops to badfo_seek.
Mentored-By: dillon
show more ...
|
#
0d47c594 |
| 23-Jan-2024 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix long-standing bug in kqueue backend for *poll*()
* The poll() family of system calls passes an fds[] array with a series of descriptors and event requests. Our kernel implementation
kernel - Fix long-standing bug in kqueue backend for *poll*()
* The poll() family of system calls passes an fds[] array with a series of descriptors and event requests. Our kernel implementation uses kqueue but a long standing bug breaks situations where more than one fds[] entry for the poll corresponds to the same { ident, filter } for kqueue, causing only the last such entry to be registered with kqueue and breaking poll().
* Added feature to kqueue to supply further distinctions between knotes beyond the nominal { kq, filter, ident } tuple, allowing us to fix poll().
* Added a FreeBSD feature where poll() implements an implied POLLHUP when events = 0. This is used by X11 and (perhaps mistakenly) also by sshd. Our poll previous ignored fds[] entries with events = 0.
* Note that sshd can generate poll fds[] arrays with both an events = 0 and an events = POLLIN for the same descriptor, which broke sshd when I initially added the events = 0 support due to the first bug.
Now with that fixed, sshd works properly. However it is unclear whether the authors of sshd intended events = 0 to detect POLLHUP or not.
Reported-by: servik (missing events = 0 poll feature) Testing: servik, dillon
show more ...
|
#
e6bc4d0d |
| 28-Mar-2023 |
Matthew Dillon <dillon@apollo.backplane.com> |
poll/select: Fix panic in kqueue backend
* The poll and select system calls use kqueue as a backend and attempt to cache active events from prior calls to improve performance.
However, this m
poll/select: Fix panic in kqueue backend
* The poll and select system calls use kqueue as a backend and attempt to cache active events from prior calls to improve performance.
However, this makes a potential race more likely where in a high-concurrency application one thread close()es a descriptor that another thread had previously used in a poll/select operation and this close() races the later poll/select operation that is attempting to remove the kevent.
* The race can sometimes prevent the poll/select kevent copyout code from removing previously cached but no-longer-used events, because the removal references the events by their descriptor rather than directly and the descriptor is no longer valid.
This causes kern_kevent() to loop infinite and hit a panic designed to check for that situation.
* Fix the problem by moving the removal of old events from the poll/select copyout code into kqueue_scan(). kqueue_scan() can detect old unused events using the sequence id that the poll/select kernel code stores in the kevent.
show more ...
|
Revision tags: v6.4.0, v6.4.0rc1, v6.5.0, v6.2.2, v6.2.1, v6.2.0, v6.3.0, v6.0.1 |
|
#
777b8d88 |
| 05-Sep-2021 |
Sascha Wildner <saw@online.de> |
kernel: In some files, make it clearer that only 0 is returned.
The error case has already been dealt with earlier in the function.
|
Revision tags: v6.0.0, v6.0.0rc1, v6.1.0 |
|
#
c41d0ae6 |
| 04-Nov-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix uninitialized variable in kqueue_register()
* Fix another uninitialized variable that gcc didn't detect, this time in kqueue_register().
* Caused kernel compiled with -O0 to not oper
kernel - Fix uninitialized variable in kqueue_register()
* Fix another uninitialized variable that gcc didn't detect, this time in kqueue_register().
* Caused kernel compiled with -O0 to not operate properly.
* Initializing the error variable to 0 solves the problem.
Reported-by: zrj
show more ...
|
Revision tags: v5.8.3, v5.8.2 |
|
#
80d831e1 |
| 25-Jul-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor in-kernel system call API to remove bcopy()
* Change the in-kernel system call prototype to take the system call arguments as a separate pointer, and make the contents read-onl
kernel - Refactor in-kernel system call API to remove bcopy()
* Change the in-kernel system call prototype to take the system call arguments as a separate pointer, and make the contents read-only.
int sy_call_t (void *); int sy_call_t (struct sysmsg *sysmsg, const void *);
* System calls with 6 arguments or less no longer need to copy the arguments from the trapframe to a holding structure. Instead, we simply point into the trapframe.
The L1 cache footprint will be a bit smaller, but in simple tests the results are not noticably faster... maybe 1ns or so (roughly 1%).
show more ...
|
#
d3b97be9 |
| 07-Jun-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor kern_kevent(), fix timeout overflow (ppoll() bug) (3)
* Fix a second timer overflow. The systimer clock variable is actualy only 32 bits, a 10 minute timeout will overflow it.
kernel - Refactor kern_kevent(), fix timeout overflow (ppoll() bug) (3)
* Fix a second timer overflow. The systimer clock variable is actualy only 32 bits, a 10 minute timeout will overflow it. Change the kqueue timeout to 1 minute to work-around.
(We really need to redo sysclock_t from 32 to 64 bits)
* This should finally fix both swildner's panic and rsmarples continued early timeout issue.
Reported-by: swildner, rsmarples
show more ...
|
#
32c8a258 |
| 06-Jun-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Increase KNOTE_CACHE_MAX
* Increase KNOTE_CACHE_MAX from 8 to 64 descriptors. These are tiny descriptors, we can afford to have a larger per-cpu cache.
|
#
7eea59b7 |
| 04-Jun-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor kern_kevent(), fix timeout overflow (ppoll() bug)
* Fix a bug where large timeouts or very small timeouts could overflow the ustimeout variable. Change the internal timeout ca
kernel - Refactor kern_kevent(), fix timeout overflow (ppoll() bug)
* Fix a bug where large timeouts or very small timeouts could overflow the ustimeout variable. Change the internal timeout cap to ensure that no overflow occurs.
* Fix another bug where the internal timeout cap could cause ppoll() to return early. Internal tsleep (etc) timeouts need to be ignored because the external timeout might be larger and will handle the error return for us when it is checked.
* Refactor kern_kevent() to be somewhat more efficient.
Reported-by: rsmarples
show more ...
|
Revision tags: v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3 |
|
#
759c3665 |
| 24-Nov-2019 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
kevent: Leading white space.
|
#
bd4f774f |
| 18-Oct-2019 |
zrj <rimvydas.jasinskas@gmail.com> |
<sys/event.h>: Make M_KQUEUE static.
Used in a single kernel source only.
While there, minor whitespace cleanup.
|
#
944cd60c |
| 25-Sep-2019 |
Sascha Wildner <saw@online.de> |
<sys/time.h>: Add 3rd arg to timespecadd()/sub() and make them public.
* Switch to the three argument versions of the timespecadd() and timespecsub() macros. These are now the predominant ones. Fr
<sys/time.h>: Add 3rd arg to timespecadd()/sub() and make them public.
* Switch to the three argument versions of the timespecadd() and timespecsub() macros. These are now the predominant ones. FreeBSD, OpenBSD, NetBSD, and Solaris (albeit only for the kernel) have them.
* Make those macros public too. This allows for a number of cleanups where they were defined locally.
Pointed-out-by: zrj Reviewed-by: dillon
show more ...
|
Revision tags: v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2 |
|
#
eb67213a |
| 26-Mar-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Rewrite the callout_*() API
* Rewrite the entire API from scratch and improve compatibility with FreeBSD. This is not an attempt to achieve full API compatibility, as FreeBSD's API has
kernel - Rewrite the callout_*() API
* Rewrite the entire API from scratch and improve compatibility with FreeBSD. This is not an attempt to achieve full API compatibility, as FreeBSD's API has unnecessary complexity that coders would frequently make mistakes interpreting.
* Remove the IPI mechanisms in favor of fine-grained spin-locks instead.
* Add some robustness features in an attempt to track down corrupted callwheel lists due to originating subsystems freeing structures out from under an active callout.
* The code supports a full-blown type-stable/adhoc-reuse structural separation between the front-end and the back-end, but this feature is currently not operational and may be removed at some future point. Instead we currently just embed the struct _callout inside the struct callout.
* Replace callout_stop_sync() with callout_cancel().
* callout_drain() is now implemented as a synchronous cancel instead of an asynchronous stop, which is closer to the FreeBSD API and expected operation for ported code (usb stack in particular). We will just have to fix any deadlocks which we come across.
* Retain our callout_terminate() function as the 'better' way to stop using a callout, as it will not only cancel the callout but also de-flag the structure so it can no longer be used.
show more ...
|
Revision tags: v5.4.1, v5.4.0 |
|
#
9292bb14 |
| 25-Nov-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix probable callout race
* Fix a probable callout race in kern/kern_event.c. It is possible for the callout to be requeued during teardown and for the structure to subsequently become
kernel - Fix probable callout race
* Fix a probable callout race in kern/kern_event.c. It is possible for the callout to be requeued during teardown and for the structure to subsequently become corrupted.
Manifests as 'stuck' processes (still ^C'able if PCATCH is flagged), and sleeps which do not expire. Can be triggered by synth bulk runs.
* Increase maximum number of kqueue timers from 4096 to 65536. This limit will have to be moved to uidinfo (but not in this commit).
show more ...
|
Revision tags: v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1 |
|
#
81ac2c0d |
| 22-Apr-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Remove SMP bottlenecks on uidinfo, descriptors, and lockf (2)
* Fix lost fp bug, a file pointer would sometimes not get dropped, leading to disconnection problems (e.g. sftp).
|
#
d6299163 |
| 22-Apr-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Remove SMP bottlenecks on uidinfo, descriptors, and lockf
* Use an eventcounter and the per-thread fd cache to fix bottlenecks in checkfdclosed(). This will work well for the vast majo
kernel - Remove SMP bottlenecks on uidinfo, descriptors, and lockf
* Use an eventcounter and the per-thread fd cache to fix bottlenecks in checkfdclosed(). This will work well for the vast majority of applications and test benches.
* Batch holdfp*() operations on kqueue collections when implementing poll() and select(). This significant improves performance. Full scaling not yet achieved, however.
* Increase copyin item batching from 8 to 32 for select() and poll().
* Give the uidinfo structure a pcpu array to hold the posixlocks and openfiles count fields, with a rollup contained in the uidinfo structure itself.
This removes numerous global bottlenecks related to open(), close(), dup*(), and lockf operations (posixlocks count).
ui_openfiles will force a rollup on limit reached to be sure that the limit was actually reached. ui_posixlocks stays fairly loose. Each cpu rolls up generally only when the pcpu count exceeds +32 or goes below -32.
* Give the proc structure a pcpu array for the same counts, in order to properly support seteuid() and such.
* Replace P_ADVLOCK with a char field proc->p_advlock_flag, and remove token operations around the field.
show more ...
|
#
35949930 |
| 20-Apr-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - per-thread fd cache, p_fd lock bypass
* Implement a per-thread (fd,fp) cache. Cache hits can keep fp's in a held state (avoiding the need to fhold()/fdrop() the ref count), and bypasse
kernel - per-thread fd cache, p_fd lock bypass
* Implement a per-thread (fd,fp) cache. Cache hits can keep fp's in a held state (avoiding the need to fhold()/fdrop() the ref count), and bypasses the p_fd spinlock. This allows the file pointer structure to generally be shared across cpu caches.
* Can cache up to four descriptors in each thread, LRU. This is the common case. Highly threaded programs tend to focus work on a distinct file descriptors in each thread.
* One file descriptor can be cached in up to four threads. This is a significant limitation, though relatively uncommon. On a cache miss the code drops into the normal shared p_fd spinlock lookup.
show more ...
|
Revision tags: v5.2.0, v5.3.0, v5.2.0rc, v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1 |
|
#
cf83cc19 |
| 19-Aug-2017 |
Imre Vadász <imre@vdsz.com> |
kqueue: Make EVFILT_USER event behaviour more consistent.
* Stop abusing the kn->kn_sfflags value for storing the current state of the EVFILT_USER filter. Instead use kn->kn_fflags like other filt
kqueue: Make EVFILT_USER event behaviour more consistent.
* Stop abusing the kn->kn_sfflags value for storing the current state of the EVFILT_USER filter. Instead use kn->kn_fflags like other filters. Similarly store the data value in kn->kn_data instead of kn->kn_sdata. This means that the fflags value gets reset when EV_CLEAR was specified when adding the event, and the event is received by userspace. This behaviour is consistent with existing kqueue filters, and allows using EVFILT_USER properly as an edge-triggered event when using the fflags, and not just level-triggered.
* Don't clear kn->kn_fflags when the event is modified with EV_CLEAR. Doing this wasn't affecting the actual state of the EVFILT_USER event before this change (since the state was kept in kn->kn_sfflags instead).
* All this also avoids blindly copying the fflags value that was specified when adding the event. Instead the NOTE_FFLAGSMASK mask is applied, and the NOTE_FF* options are used, so the returned fflags value should now always only have the lower 24 bits set.
* Make setting the fflags and data value when adding the event work as might be expected.
show more ...
|
#
5596130d |
| 21-Aug-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix callout race and panic
* Fix race (found by ivadasz) related to the IPI/WAIT code unblocking before the IPI is able to finish adjusting the knode and callout. The wait code was only
kernel - Fix callout race and panic
* Fix race (found by ivadasz) related to the IPI/WAIT code unblocking before the IPI is able to finish adjusting the knode and callout. The wait code was only waiting for the IPI counter to reach 0 via IPI_MASK, it also had to wait for the ARMED bit to get cleared.
* Avoid retesting c->c_flags to handle WAITING bit change races. Instead, fully integrate the test-and-clear of the WAITING bit into callout_unpend_disarm().
* Fix an issue where callout_terminate() fails to IPI the remote cpu due to the function dispatch code clearing the ARMED bit. No longer clear the ARMED bit. This ensures that a termination or stop waits for the callout to return.
This change means that synchronous callout operations to other cpus will be more expensive. However, the kernel generally does not do cross-cpu callouts any more so its generally non-problem.
* Remove the now unused callout_maybe_clear_armed() inline.
* Also clear kn->kn_hook for EVFILT_TIMER when removing a callout, as a safety.
Reported-by: ivadasz (Imre Vadasz)
show more ...
|
#
6b38d89f |
| 19-Aug-2017 |
Imre Vadász <imre@vdsz.com> |
kqueue: Fix typo in filt_userattach function: kn_fflags vs. kn_sfflags.
* This typo meant that adding an EVFILT_USER event with NOTE_TRIGGER already set, would fail to trigger the user event.
* S
kqueue: Fix typo in filt_userattach function: kn_fflags vs. kn_sfflags.
* This typo meant that adding an EVFILT_USER event with NOTE_TRIGGER already set, would fail to trigger the user event.
* So far I didn't find any EVFILT_USER usages in opensource code, where the NOTE_TRIGGER flag is set when adding the EVFILT_USER event, so this fix seems to be a cornercase in practice.
show more ...
|
Revision tags: v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc |
|
#
5a4bb8ec |
| 09-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Incidental MPLOCK removal (non-performance)
* proc filterops.
* kernel linkerops and kld code.
* Warn if a non-MPSAFE interrupt is installed.
* Use a private token in the disk messaging
kernel - Incidental MPLOCK removal (non-performance)
* proc filterops.
* kernel linkerops and kld code.
* Warn if a non-MPSAFE interrupt is installed.
* Use a private token in the disk messaging core (subr_disk) instead of the mp token.
* Use a private token for sysv shm adminstrative calls.
* Cleanup.
show more ...
|
Revision tags: v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0, v4.4.3, v4.4.2 |
|
#
6d2444c4 |
| 12-Dec-2015 |
Imre Vadasz <imre@vdsz.com> |
kernel - Implement ppoll system call with precise microseconds timeout.
* Implement a maximum timeout of 2000s, because systimer(9) just accepts an int timeout in microseconds.
* Add kern.kv_slee
kernel - Implement ppoll system call with precise microseconds timeout.
* Implement a maximum timeout of 2000s, because systimer(9) just accepts an int timeout in microseconds.
* Add kern.kv_sleep_threshold sysctl variable for tuning the threshold for the ppoll sleep duration (in nanoseconds), below which we will busy-loop with DELAY instead of using tsleep for waiting.
show more ...
|
#
5e25370d |
| 29-Apr-2016 |
Tomohiro Kusumi <kusumi.tomohiro@gmail.com> |
sys/kern: Add kqueue EVFILT_FS
Brought in from FreeBSD@GitHub bbaa6c3ec045b7de225f726d3c9367510b287184. Needed by autofs.
Triggers an event on mount(2) and unmount(2).
Also see https://bugs.dragon
sys/kern: Add kqueue EVFILT_FS
Brought in from FreeBSD@GitHub bbaa6c3ec045b7de225f726d3c9367510b287184. Needed by autofs.
Triggers an event on mount(2) and unmount(2).
Also see https://bugs.dragonflybsd.org/issues/2905. Reviewed-by: sephe
show more ...
|
#
67a73dd3 |
| 03-May-2016 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
Revert "kqueue: Avoid reprocessing processed knotes in KNOTE."
This reverts commit ed9db6a1912db34af387ff6978a265003258df16.
This cause panic under certain network load.
Reported-by: pavalos@
|
#
271ec5b5 |
| 03-May-2016 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
Revert "kqueue: Return value of knote_release is no longer useful."
This reverts commit b75b5648541ea38deaf678ee62466780ffe62374.
Prepare to revert ed9db6a1912db34af387ff6978a265003258df16, which c
Revert "kqueue: Return value of knote_release is no longer useful."
This reverts commit b75b5648541ea38deaf678ee62466780ffe62374.
Prepare to revert ed9db6a1912db34af387ff6978a265003258df16, which causes panic under certain network load.
Reported-by: pavalos@
show more ...
|