History log of /dragonfly/sys/kern/kern_event.c (Results 1 – 25 of 147)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 0d47c594 23-Jan-2024 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix long-standing bug in kqueue backend for *poll*()

* The poll() family of system calls passes an fds[] array with a
series of descriptors and event requests. Our kernel implementation

kernel - Fix long-standing bug in kqueue backend for *poll*()

* The poll() family of system calls passes an fds[] array with a
series of descriptors and event requests. Our kernel implementation
uses kqueue but a long standing bug breaks situations where
more than one fds[] entry for the poll corresponds to the same
{ ident, filter } for kqueue, causing only the last such entry
to be registered with kqueue and breaking poll().

* Added feature to kqueue to supply further distinctions between
knotes beyond the nominal { kq, filter, ident } tuple, allowing
us to fix poll().

* Added a FreeBSD feature where poll() implements an implied POLLHUP
when events = 0. This is used by X11 and (perhaps mistakenly) also
by sshd. Our poll previous ignored fds[] entries with events = 0.

* Note that sshd can generate poll fds[] arrays with both an events = 0
and an events = POLLIN for the same descriptor, which broke sshd
when I initially added the events = 0 support due to the first bug.

Now with that fixed, sshd works properly. However it is unclear whether
the authors of sshd intended events = 0 to detect POLLHUP or not.

Reported-by: servik (missing events = 0 poll feature)
Testing: servik, dillon

show more ...


# e6bc4d0d 28-Mar-2023 Matthew Dillon <dillon@apollo.backplane.com>

poll/select: Fix panic in kqueue backend

* The poll and select system calls use kqueue as a backend and
attempt to cache active events from prior calls to improve
performance.

However, this m

poll/select: Fix panic in kqueue backend

* The poll and select system calls use kqueue as a backend and
attempt to cache active events from prior calls to improve
performance.

However, this makes a potential race more likely where in a
high-concurrency application one thread close()es a descriptor
that another thread had previously used in a poll/select operation
and this close() races the later poll/select operation that is
attempting to remove the kevent.

* The race can sometimes prevent the poll/select kevent copyout
code from removing previously cached but no-longer-used
events, because the removal references the events by their
descriptor rather than directly and the descriptor is no longer
valid.

This causes kern_kevent() to loop infinite and hit a panic
designed to check for that situation.

* Fix the problem by moving the removal of old events from the
poll/select copyout code into kqueue_scan(). kqueue_scan()
can detect old unused events using the sequence id that the
poll/select kernel code stores in the kevent.

show more ...


Revision tags: v6.4.0, v6.4.0rc1, v6.5.0, v6.2.2, v6.2.1, v6.2.0, v6.3.0, v6.0.1
# 777b8d88 05-Sep-2021 Sascha Wildner <saw@online.de>

kernel: In some files, make it clearer that only 0 is returned.

The error case has already been dealt with earlier in the function.


Revision tags: v6.0.0, v6.0.0rc1, v6.1.0
# c41d0ae6 04-Nov-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix uninitialized variable in kqueue_register()

* Fix another uninitialized variable that gcc didn't detect,
this time in kqueue_register().

* Caused kernel compiled with -O0 to not oper

kernel - Fix uninitialized variable in kqueue_register()

* Fix another uninitialized variable that gcc didn't detect,
this time in kqueue_register().

* Caused kernel compiled with -O0 to not operate properly.

* Initializing the error variable to 0 solves the problem.

Reported-by: zrj

show more ...


Revision tags: v5.8.3, v5.8.2
# 80d831e1 25-Jul-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Refactor in-kernel system call API to remove bcopy()

* Change the in-kernel system call prototype to take the
system call arguments as a separate pointer, and make the
contents read-onl

kernel - Refactor in-kernel system call API to remove bcopy()

* Change the in-kernel system call prototype to take the
system call arguments as a separate pointer, and make the
contents read-only.

int sy_call_t (void *);
int sy_call_t (struct sysmsg *sysmsg, const void *);

* System calls with 6 arguments or less no longer need to copy
the arguments from the trapframe to a holding structure. Instead,
we simply point into the trapframe.

The L1 cache footprint will be a bit smaller, but in simple tests
the results are not noticably faster... maybe 1ns or so
(roughly 1%).

show more ...


# d3b97be9 07-Jun-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Refactor kern_kevent(), fix timeout overflow (ppoll() bug) (3)

* Fix a second timer overflow. The systimer clock variable is
actualy only 32 bits, a 10 minute timeout will overflow it.

kernel - Refactor kern_kevent(), fix timeout overflow (ppoll() bug) (3)

* Fix a second timer overflow. The systimer clock variable is
actualy only 32 bits, a 10 minute timeout will overflow it.
Change the kqueue timeout to 1 minute to work-around.

(We really need to redo sysclock_t from 32 to 64 bits)

* This should finally fix both swildner's panic and rsmarples
continued early timeout issue.

Reported-by: swildner, rsmarples

show more ...


# 32c8a258 06-Jun-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Increase KNOTE_CACHE_MAX

* Increase KNOTE_CACHE_MAX from 8 to 64 descriptors. These are tiny
descriptors, we can afford to have a larger per-cpu cache.


# 7eea59b7 04-Jun-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Refactor kern_kevent(), fix timeout overflow (ppoll() bug)

* Fix a bug where large timeouts or very small timeouts could
overflow the ustimeout variable. Change the internal timeout
ca

kernel - Refactor kern_kevent(), fix timeout overflow (ppoll() bug)

* Fix a bug where large timeouts or very small timeouts could
overflow the ustimeout variable. Change the internal timeout
cap to ensure that no overflow occurs.

* Fix another bug where the internal timeout cap could cause
ppoll() to return early. Internal tsleep (etc) timeouts
need to be ignored because the external timeout might be
larger and will handle the error return for us when it is
checked.

* Refactor kern_kevent() to be somewhat more efficient.

Reported-by: rsmarples

show more ...


Revision tags: v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3
# 759c3665 24-Nov-2019 Sepherosa Ziehau <sephe@dragonflybsd.org>

kevent: Leading white space.


# bd4f774f 18-Oct-2019 zrj <rimvydas.jasinskas@gmail.com>

<sys/event.h>: Make M_KQUEUE static.

Used in a single kernel source only.

While there, minor whitespace cleanup.


# 944cd60c 25-Sep-2019 Sascha Wildner <saw@online.de>

<sys/time.h>: Add 3rd arg to timespecadd()/sub() and make them public.

* Switch to the three argument versions of the timespecadd() and
timespecsub() macros. These are now the predominant ones. Fr

<sys/time.h>: Add 3rd arg to timespecadd()/sub() and make them public.

* Switch to the three argument versions of the timespecadd() and
timespecsub() macros. These are now the predominant ones. FreeBSD,
OpenBSD, NetBSD, and Solaris (albeit only for the kernel) have them.

* Make those macros public too. This allows for a number of cleanups
where they were defined locally.

Pointed-out-by: zrj
Reviewed-by: dillon

show more ...


Revision tags: v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2
# eb67213a 26-Mar-2019 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Rewrite the callout_*() API

* Rewrite the entire API from scratch and improve compatibility
with FreeBSD. This is not an attempt to achieve full API compatibility,
as FreeBSD's API has

kernel - Rewrite the callout_*() API

* Rewrite the entire API from scratch and improve compatibility
with FreeBSD. This is not an attempt to achieve full API compatibility,
as FreeBSD's API has unnecessary complexity that coders would frequently
make mistakes interpreting.

* Remove the IPI mechanisms in favor of fine-grained spin-locks instead.

* Add some robustness features in an attempt to track down corrupted
callwheel lists due to originating subsystems freeing structures out
from under an active callout.

* The code supports a full-blown type-stable/adhoc-reuse structural
separation between the front-end and the back-end, but this feature
is currently not operational and may be removed at some future point.
Instead we currently just embed the struct _callout inside the
struct callout.

* Replace callout_stop_sync() with callout_cancel().

* callout_drain() is now implemented as a synchronous cancel instead
of an asynchronous stop, which is closer to the FreeBSD API and
expected operation for ported code (usb stack in particular). We
will just have to fix any deadlocks which we come across.

* Retain our callout_terminate() function as the 'better' way to
stop using a callout, as it will not only cancel the callout but
also de-flag the structure so it can no longer be used.

show more ...


Revision tags: v5.4.1, v5.4.0
# 9292bb14 25-Nov-2018 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix probable callout race

* Fix a probable callout race in kern/kern_event.c. It is possible
for the callout to be requeued during teardown and for the
structure to subsequently become

kernel - Fix probable callout race

* Fix a probable callout race in kern/kern_event.c. It is possible
for the callout to be requeued during teardown and for the
structure to subsequently become corrupted.

Manifests as 'stuck' processes (still ^C'able if PCATCH is flagged),
and sleeps which do not expire. Can be triggered by synth bulk runs.

* Increase maximum number of kqueue timers from 4096 to 65536. This
limit will have to be moved to uidinfo (but not in this commit).

show more ...


Revision tags: v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1
# 81ac2c0d 22-Apr-2018 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Remove SMP bottlenecks on uidinfo, descriptors, and lockf (2)

* Fix lost fp bug, a file pointer would sometimes not get dropped,
leading to disconnection problems (e.g. sftp).


# d6299163 22-Apr-2018 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Remove SMP bottlenecks on uidinfo, descriptors, and lockf

* Use an eventcounter and the per-thread fd cache to fix
bottlenecks in checkfdclosed(). This will work well for
the vast majo

kernel - Remove SMP bottlenecks on uidinfo, descriptors, and lockf

* Use an eventcounter and the per-thread fd cache to fix
bottlenecks in checkfdclosed(). This will work well for
the vast majority of applications and test benches.

* Batch holdfp*() operations on kqueue collections when implementing
poll() and select(). This significant improves performance.
Full scaling not yet achieved, however.

* Increase copyin item batching from 8 to 32 for select() and poll().

* Give the uidinfo structure a pcpu array to hold the posixlocks
and openfiles count fields, with a rollup contained in the uidinfo
structure itself.

This removes numerous global bottlenecks related to open(),
close(), dup*(), and lockf operations (posixlocks count).

ui_openfiles will force a rollup on limit reached to be sure
that the limit was actually reached. ui_posixlocks stays fairly
loose. Each cpu rolls up generally only when the pcpu count exceeds
+32 or goes below -32.

* Give the proc structure a pcpu array for the same counts, in order
to properly support seteuid() and such.

* Replace P_ADVLOCK with a char field proc->p_advlock_flag, and
remove token operations around the field.

show more ...


# 35949930 20-Apr-2018 Matthew Dillon <dillon@apollo.backplane.com>

kernel - per-thread fd cache, p_fd lock bypass

* Implement a per-thread (fd,fp) cache. Cache hits can keep fp's
in a held state (avoiding the need to fhold()/fdrop() the ref count),
and bypasse

kernel - per-thread fd cache, p_fd lock bypass

* Implement a per-thread (fd,fp) cache. Cache hits can keep fp's
in a held state (avoiding the need to fhold()/fdrop() the ref count),
and bypasses the p_fd spinlock. This allows the file pointer structure
to generally be shared across cpu caches.

* Can cache up to four descriptors in each thread, LRU. This is the common
case. Highly threaded programs tend to focus work on a distinct
file descriptors in each thread.

* One file descriptor can be cached in up to four threads. This is
a significant limitation, though relatively uncommon. On a cache miss
the code drops into the normal shared p_fd spinlock lookup.

show more ...


Revision tags: v5.2.0, v5.3.0, v5.2.0rc, v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1
# cf83cc19 19-Aug-2017 Imre Vadász <imre@vdsz.com>

kqueue: Make EVFILT_USER event behaviour more consistent.

* Stop abusing the kn->kn_sfflags value for storing the current state of
the EVFILT_USER filter. Instead use kn->kn_fflags like other filt

kqueue: Make EVFILT_USER event behaviour more consistent.

* Stop abusing the kn->kn_sfflags value for storing the current state of
the EVFILT_USER filter. Instead use kn->kn_fflags like other filters.
Similarly store the data value in kn->kn_data instead of kn->kn_sdata.
This means that the fflags value gets reset when EV_CLEAR was specified
when adding the event, and the event is received by userspace. This
behaviour is consistent with existing kqueue filters, and allows using
EVFILT_USER properly as an edge-triggered event when using the fflags,
and not just level-triggered.

* Don't clear kn->kn_fflags when the event is modified with EV_CLEAR. Doing
this wasn't affecting the actual state of the EVFILT_USER event before
this change (since the state was kept in kn->kn_sfflags instead).

* All this also avoids blindly copying the fflags value that was specified
when adding the event. Instead the NOTE_FFLAGSMASK mask is applied, and
the NOTE_FF* options are used, so the returned fflags value should now
always only have the lower 24 bits set.

* Make setting the fflags and data value when adding the event work as
might be expected.

show more ...


# 5596130d 21-Aug-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix callout race and panic

* Fix race (found by ivadasz) related to the IPI/WAIT code unblocking
before the IPI is able to finish adjusting the knode and callout.
The wait code was only

kernel - Fix callout race and panic

* Fix race (found by ivadasz) related to the IPI/WAIT code unblocking
before the IPI is able to finish adjusting the knode and callout.
The wait code was only waiting for the IPI counter to reach 0 via
IPI_MASK, it also had to wait for the ARMED bit to get cleared.

* Avoid retesting c->c_flags to handle WAITING bit change races. Instead,
fully integrate the test-and-clear of the WAITING bit into
callout_unpend_disarm().

* Fix an issue where callout_terminate() fails to IPI the remote cpu
due to the function dispatch code clearing the ARMED bit. No
longer clear the ARMED bit. This ensures that a termination or stop
waits for the callout to return.

This change means that synchronous callout operations to other cpus will
be more expensive. However, the kernel generally does not do cross-cpu
callouts any more so its generally non-problem.

* Remove the now unused callout_maybe_clear_armed() inline.

* Also clear kn->kn_hook for EVFILT_TIMER when removing a callout, as
a safety.

Reported-by: ivadasz (Imre Vadasz)

show more ...


# 6b38d89f 19-Aug-2017 Imre Vadász <imre@vdsz.com>

kqueue: Fix typo in filt_userattach function: kn_fflags vs. kn_sfflags.

* This typo meant that adding an EVFILT_USER event with NOTE_TRIGGER already
set, would fail to trigger the user event.

* S

kqueue: Fix typo in filt_userattach function: kn_fflags vs. kn_sfflags.

* This typo meant that adding an EVFILT_USER event with NOTE_TRIGGER already
set, would fail to trigger the user event.

* So far I didn't find any EVFILT_USER usages in opensource code, where
the NOTE_TRIGGER flag is set when adding the EVFILT_USER event, so this
fix seems to be a cornercase in practice.

show more ...


Revision tags: v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc
# 5a4bb8ec 09-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Incidental MPLOCK removal (non-performance)

* proc filterops.

* kernel linkerops and kld code.

* Warn if a non-MPSAFE interrupt is installed.

* Use a private token in the disk messaging

kernel - Incidental MPLOCK removal (non-performance)

* proc filterops.

* kernel linkerops and kld code.

* Warn if a non-MPSAFE interrupt is installed.

* Use a private token in the disk messaging core (subr_disk) instead of
the mp token.

* Use a private token for sysv shm adminstrative calls.

* Cleanup.

show more ...


Revision tags: v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0, v4.4.3, v4.4.2
# 6d2444c4 12-Dec-2015 Imre Vadasz <imre@vdsz.com>

kernel - Implement ppoll system call with precise microseconds timeout.

* Implement a maximum timeout of 2000s, because systimer(9) just accepts an
int timeout in microseconds.

* Add kern.kv_slee

kernel - Implement ppoll system call with precise microseconds timeout.

* Implement a maximum timeout of 2000s, because systimer(9) just accepts an
int timeout in microseconds.

* Add kern.kv_sleep_threshold sysctl variable for tuning the threshold for
the ppoll sleep duration (in nanoseconds), below which we will
busy-loop with DELAY instead of using tsleep for waiting.

show more ...


# 5e25370d 29-Apr-2016 Tomohiro Kusumi <kusumi.tomohiro@gmail.com>

sys/kern: Add kqueue EVFILT_FS

Brought in from FreeBSD@GitHub bbaa6c3ec045b7de225f726d3c9367510b287184.
Needed by autofs.

Triggers an event on mount(2) and unmount(2).

Also see https://bugs.dragon

sys/kern: Add kqueue EVFILT_FS

Brought in from FreeBSD@GitHub bbaa6c3ec045b7de225f726d3c9367510b287184.
Needed by autofs.

Triggers an event on mount(2) and unmount(2).

Also see https://bugs.dragonflybsd.org/issues/2905.
Reviewed-by: sephe

show more ...


# 67a73dd3 03-May-2016 Sepherosa Ziehau <sephe@dragonflybsd.org>

Revert "kqueue: Avoid reprocessing processed knotes in KNOTE."

This reverts commit ed9db6a1912db34af387ff6978a265003258df16.

This cause panic under certain network load.

Reported-by: pavalos@


# 271ec5b5 03-May-2016 Sepherosa Ziehau <sephe@dragonflybsd.org>

Revert "kqueue: Return value of knote_release is no longer useful."

This reverts commit b75b5648541ea38deaf678ee62466780ffe62374.

Prepare to revert ed9db6a1912db34af387ff6978a265003258df16, which c

Revert "kqueue: Return value of knote_release is no longer useful."

This reverts commit b75b5648541ea38deaf678ee62466780ffe62374.

Prepare to revert ed9db6a1912db34af387ff6978a265003258df16, which causes
panic under certain network load.

Reported-by: pavalos@

show more ...


# 7a528cd4 14-Apr-2016 Sepherosa Ziehau <sephe@dragonflybsd.org>

kqueue: Use critical section for knote cache

So knote_free() can be triggered from interrupt threads safely.

Suggested-by: dillon@


123456