xref: /openbsd/lib/libc/sys/kqueue.2 (revision 3d8817e4)
1.\"	$OpenBSD: kqueue.2,v 1.22 2007/05/31 19:19:32 jmc Exp $
2.\"
3.\" Copyright (c) 2000 Jonathan Lemon
4.\" All rights reserved.
5.\"
6.\" Redistribution and use in source and binary forms, with or without
7.\" modification, are permitted provided that the following conditions
8.\" are met:
9.\" 1. Redistributions of source code must retain the above copyright
10.\"    notice, this list of conditions and the following disclaimer.
11.\" 2. Redistributions in binary form must reproduce the above copyright
12.\"    notice, this list of conditions and the following disclaimer in the
13.\"    documentation and/or other materials provided with the distribution.
14.\"
15.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND
16.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
18.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
19.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
20.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
21.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
22.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
23.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
24.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
25.\" SUCH DAMAGE.
26.\"
27.\" $FreeBSD: src/lib/libc/sys/kqueue.2,v 1.18 2001/02/14 08:48:35 guido Exp $
28.\"
29.Dd $Mdocdate: May 31 2007 $
30.Dt KQUEUE 2
31.Os
32.Sh NAME
33.Nm kqueue ,
34.Nm kevent
35.Nd kernel event notification mechanism
36.Sh SYNOPSIS
37.Fd #include <sys/types.h>
38.Fd #include <sys/event.h>
39.Fd #include <sys/time.h>
40.Ft int
41.Fn kqueue "void"
42.Ft int
43.Fn kevent "int kq" "const struct kevent *changelist" "int nchanges" "struct kevent *eventlist" "int nevents" "const struct timespec *timeout"
44.Fn EV_SET "&kev" ident filter flags fflags data udata
45.Sh DESCRIPTION
46.Fn kqueue
47provides a generic method of notifying the user when an event
48happens or a condition holds, based on the results of small
49pieces of kernel code termed
50.Dq filters .
51A kevent is identified by the (ident, filter) pair; there may only
52be one unique kevent per kqueue.
53.Pp
54The filter is executed upon the initial registration of a kevent
55in order to detect whether a preexisting condition is present, and is also
56executed whenever an event is passed to the filter for evaluation.
57If the filter determines that the condition should be reported,
58then the kevent is placed on the kqueue for the user to retrieve.
59.Pp
60The filter is also run when the user attempts to retrieve the kevent
61from the kqueue.
62If the filter indicates that the condition that triggered
63the event no longer holds, the kevent is removed from the kqueue and
64is not returned.
65.Pp
66Multiple events which trigger the filter do not result in multiple
67kevents being placed on the kqueue; instead, the filter will aggregate
68the events into a single
69.Li struct kevent .
70Calling
71.Fn close
72on a file descriptor will remove any kevents that reference the descriptor.
73.Pp
74.Fn kqueue
75creates a new kernel event queue and returns a descriptor.
76The queue is not inherited by a child created with
77.Xr fork 2 .
78However, if
79.Xr rfork 2
80is called without the
81.Dv RFFDG
82flag, then the descriptor table is shared,
83which will allow sharing of the kqueue between two processes.
84.Pp
85.Fn kevent
86is used to register events with the queue, and return any pending
87events to the user.
88.Fa changelist
89is a pointer to an array of
90.Va kevent
91structures, as defined in
92.Aq Pa sys/event.h .
93All changes contained in the
94.Fa changelist
95are applied before any pending events are read from the queue.
96.Fa nchanges
97gives the size of
98.Fa changelist .
99.Fa eventlist
100is a pointer to an array of kevent structures.
101.Fa nevents
102determines the size of
103.Fa eventlist .
104When
105.Fa nevents
106is zero,
107.Fn kevent
108will return immediately even if there is a
109.Fa timeout
110specified unlike
111.Xr select 2 .
112If
113.Fa timeout
114is a non-null pointer, it specifies a maximum interval to wait
115for an event, which will be interpreted as a
116.Li struct timespec .
117If
118.Fa timeout
119is a null pointer,
120.Fn kevent
121waits indefinitely.
122To effect a poll, the
123.Fa timeout
124argument should be non-null, pointing to a zero-valued
125.Va timespec
126structure.
127The same array may be used for the
128.Fa changelist
129and
130.Fa eventlist .
131.Pp
132.Fn EV_SET
133is a macro which is provided for ease of initializing a
134kevent structure.
135.Pp
136The
137.Va kevent
138structure is defined as:
139.Bd -literal
140struct kevent {
141	u_int	ident;		/* identifier for this event */
142	short	filter;		/* filter for event */
143	u_short	flags;		/* action flags for kqueue */
144	u_int	fflags;		/* filter flag value */
145	int	data;		/* filter data value */
146	void	*udata;		/* opaque user data identifier */
147};
148.Ed
149.Pp
150The fields of
151.Li struct kevent
152are:
153.Bl -tag -width XXXfilter
154.It ident
155Value used to identify this event.
156The exact interpretation is determined by the attached filter,
157but often is a file descriptor.
158.It filter
159Identifies the kernel filter used to process this event.
160The pre-defined system filters are described below.
161.It flags
162Actions to perform on the event.
163.It fflags
164Filter-specific flags.
165.It data
166Filter-specific data value.
167.It udata
168Opaque user-defined value passed through the kernel unchanged.
169.El
170.Pp
171The
172.Va flags
173field can contain the following values:
174.Bl -tag -width XXXEV_ONESHOT
175.It Dv EV_ADD
176Adds the event to the kqueue.
177Re-adding an existing event will modify the parameters of the original event,
178and not result in a duplicate entry.
179Adding an event automatically enables it, unless overridden by the
180.Dv EV_DISABLE
181flag.
182.It Dv EV_ENABLE
183Permit
184.Fn kevent
185to return the event if it is triggered.
186.It Dv EV_DISABLE
187Disable the event so
188.Fn kevent
189will not return it.
190The filter itself is not disabled.
191.It Dv EV_DELETE
192Removes the event from the kqueue.
193Events which are attached to file descriptors are automatically deleted
194on the last close of the descriptor.
195.It Dv EV_ONESHOT
196Causes the event to return only the first occurrence of the filter
197being triggered.
198After the user retrieves the event from the kqueue, it is deleted.
199.It Dv EV_CLEAR
200After the event is retrieved by the user, its state is reset.
201This is useful for filters which report state transitions
202instead of the current state.
203Note that some filters may automatically set this flag internally.
204.It Dv EV_EOF
205Filters may set this flag to indicate filter-specific EOF condition.
206.It Dv EV_ERROR
207See
208.Sx RETURN VALUES
209below.
210.El
211.Pp
212The predefined system filters are listed below.
213Arguments may be passed to and from the filter via the
214.Va fflags
215and
216.Va data
217fields in the kevent structure.
218.Bl -tag -width EVFILT_SIGNAL
219.It Dv EVFILT_READ
220Takes a descriptor as the identifier, and returns whenever
221there is data available to read.
222The behavior of the filter is slightly different depending
223on the descriptor type.
224.Bl -tag -width 2n
225.It Sockets
226Sockets which have previously been passed to
227.Fn listen
228return when there is an incoming connection pending.
229.Va data
230contains the size of the listen backlog.
231.Pp
232Other socket descriptors return when there is data to be read,
233subject to the
234.Dv SO_RCVLOWAT
235value of the socket buffer.
236This may be overridden with a per-filter low water mark at the
237time the filter is added by setting the
238.Dv NOTE_LOWAT
239flag in
240.Va fflags ,
241and specifying the new low water mark in
242.Va data .
243On return,
244.Va data
245contains the number of bytes in the socket buffer.
246.Pp
247If the read direction of the socket has shutdown, then the filter
248also sets
249.Dv EV_EOF
250in
251.Va flags ,
252and returns the socket error (if any) in
253.Va fflags .
254It is possible for EOF to be returned (indicating the connection is gone)
255while there is still data pending in the socket buffer.
256.It Vnodes
257Returns when the file pointer is not at the end of file.
258.Va data
259contains the offset from current position to end of file,
260and may be negative.
261If
262.Dv NOTE_EOF
263is set in
264.Va fflags ,
265.Fn kevent
266will also return when the file pointer is at the end of file.
267The end of file condition is indicated by the presence of
268.Dv NOTE_EOF
269in
270.Va fflags
271on return.
272.It "Fifos, Pipes"
273Returns when there is data to read;
274.Va data
275contains the number of bytes available.
276.Pp
277When the last writer disconnects, the filter will set
278.Dv EV_EOF
279in
280.Va flags .
281This may be cleared by passing in
282.Dv EV_CLEAR ,
283at which point the filter will resume waiting for data to become
284available before returning.
285.It "BPF devices"
286Returns when the BPF buffer is full, the BPF timeout has expired, or
287when the BPF has
288.Dq immediate mode
289enabled and there is any data to read;
290.Va data
291contains the number of bytes available.
292.El
293.It Dv EVFILT_WRITE
294Takes a descriptor as the identifier, and returns whenever
295it is possible to write to the descriptor.
296For sockets, pipes, and FIFOs,
297.Va data
298will contain the amount of space remaining in the write buffer.
299The filter will set
300.Dv EV_EOF
301when the reader disconnects, and for the FIFO case,
302this may be cleared by use of
303.Dv EV_CLEAR .
304Note that this filter is not supported for vnodes or BPF devices.
305.Pp
306For sockets, the low water mark and socket error handling is
307identical to the
308.Dv EVFILT_READ
309case.
310.It Dv EVFILT_AIO
311The sigevent portion of the AIO request is filled in, with
312.Va sigev_notify_kqueue
313containing the descriptor of the kqueue that the event should
314be attached to,
315.Va sigev_value
316containing the udata value, and
317.Va sigev_notify
318set to
319.Dv SIGEV_KEVENT .
320When the aio_* function is called, the event will be registered
321with the specified kqueue, and the
322.Va ident
323argument set to the
324.Li struct aiocb
325returned by the aio_* function.
326The filter returns under the same conditions as aio_error.
327.Pp
328Alternatively, a kevent structure may be initialized, with
329.Va ident
330containing the descriptor of the kqueue, and the
331address of the kevent structure placed in the
332.Va aio_lio_opcode
333field of the AIO request.
334However, this approach will not work on architectures with 64-bit pointers,
335and should be considered deprecated.
336.It Dv EVFILT_VNODE
337Takes a file descriptor as the identifier and the events to watch for in
338.Va fflags ,
339and returns when one or more of the requested events occurs on the descriptor.
340The events to monitor are:
341.Bl -tag -width XXNOTE_RENAME
342.It Dv NOTE_DELETE
343.Fn unlink
344was called on the file referenced by the descriptor.
345.It Dv NOTE_WRITE
346A write occurred on the file referenced by the descriptor.
347.It Dv NOTE_EXTEND
348The file referenced by the descriptor was extended.
349.It Dv NOTE_TRUNCATE
350The file referenced by the descriptor was truncated.
351.It Dv NOTE_ATTRIB
352The file referenced by the descriptor had its attributes changed.
353.It Dv NOTE_LINK
354The link count on the file changed.
355.It Dv NOTE_RENAME
356The file referenced by the descriptor was renamed.
357.It Dv NOTE_REVOKE
358Access to the file was revoked via
359.Xr revoke 2
360or the underlying file system was unmounted.
361.El
362.Pp
363On return,
364.Va fflags
365contains the events which triggered the filter.
366.It Dv EVFILT_PROC
367Takes the process ID to monitor as the identifier and the events to watch for
368in
369.Va fflags ,
370and returns when the process performs one or more of the requested events.
371If a process can normally see another process, it can attach an event to it.
372The events to monitor are:
373.Bl -tag -width XXNOTE_TRACKERR
374.It Dv NOTE_EXIT
375The process has exited.
376.It Dv NOTE_FORK
377The process has called
378.Fn fork .
379.It Dv NOTE_EXEC
380The process has executed a new process via
381.Xr execve 2
382or similar call.
383.It Dv NOTE_TRACK
384Follow a process across
385.Fn fork
386calls.
387The parent process will return with
388.Dv NOTE_FORK
389set in the
390.Va fflags
391field, while the child process will return with
392.Dv NOTE_CHILD
393set in
394.Va fflags
395and the parent PID in
396.Va data .
397.It Dv NOTE_TRACKERR
398This flag is returned if the system was unable to attach an event to
399the child process, usually due to resource limitations.
400.El
401.Pp
402On return,
403.Va fflags
404contains the events which triggered the filter.
405.It Dv EVFILT_SIGNAL
406Takes the signal number to monitor as the identifier and returns
407when the given signal is delivered to the process.
408This coexists with the
409.Fn signal
410and
411.Fn sigaction
412facilities, and has a lower precedence.
413The filter will record all attempts to deliver a signal to a process,
414even if the signal has been marked as
415.Dv SIG_IGN .
416Event notification happens after normal signal delivery processing.
417.Va data
418returns the number of times the signal has occurred since the last call to
419.Fn kevent .
420This filter automatically sets the
421.Dv EV_CLEAR
422flag internally.
423.It Dv EVFILT_TIMER
424Establishes an arbitrary timer identified by
425.Va ident .
426When adding a timer,
427.Va data
428specifies the timeout period in milliseconds.
429The timer will be periodic unless
430.Dv EV_ONESHOT
431is specified.
432On return,
433.Va data
434contains the number of times the timeout has expired since the last call to
435.Fn kevent .
436This filter automatically sets the
437.Dv EV_CLEAR
438flag internally.
439.El
440.Sh RETURN VALUES
441.Fn kqueue
442creates a new kernel event queue and returns a file descriptor.
443If there was an error creating the kernel event queue, a value of -1 is
444returned and errno set.
445.Pp
446.Fn kevent
447returns the number of events placed in the
448.Fa eventlist ,
449up to the value given by
450.Fa nevents .
451If an error occurs while processing an element of the
452.Fa changelist
453and there is enough room in the
454.Fa eventlist ,
455then the event will be placed in the
456.Fa eventlist
457with
458.Dv EV_ERROR
459set in
460.Va flags
461and the system error in
462.Va data .
463Otherwise,
464.Dv -1
465will be returned, and
466.Dv errno
467will be set to indicate the error condition.
468If the time limit expires, then
469.Fn kevent
470returns 0.
471.Sh ERRORS
472The
473.Fn kqueue
474function fails if:
475.Bl -tag -width Er
476.It Bq Er ENOMEM
477The kernel failed to allocate enough memory for the kernel queue.
478.It Bq Er EMFILE
479The per-process descriptor table is full.
480.It Bq Er ENFILE
481The system file table is full.
482.El
483.Pp
484The
485.Fn kevent
486function fails if:
487.Bl -tag -width Er
488.It Bq Er EACCES
489The process does not have permission to register a filter.
490.It Bq Er EFAULT
491There was an error reading or writing the
492.Va kevent
493structure.
494.It Bq Er EBADF
495The specified descriptor is invalid.
496.It Bq Er EINTR
497A signal was delivered before the timeout expired and before any
498events were placed on the kqueue for return.
499.It Bq Er EINVAL
500The specified time limit or filter is invalid.
501.It Bq Er ENOENT
502The event could not be found to be modified or deleted.
503.It Bq Er ENOMEM
504No memory was available to register the event.
505.It Bq Er ESRCH
506The specified process to attach to does not exist.
507.El
508.Sh SEE ALSO
509.Xr poll 2 ,
510.Xr read 2 ,
511.Xr select 2 ,
512.Xr sigaction 2 ,
513.Xr write 2 ,
514.Xr signal 3
515.Sh HISTORY
516The
517.Fn kqueue
518and
519.Fn kevent
520functions first appeared in
521.Fx 4.1 .
522.Sh AUTHORS
523The
524.Fn kqueue
525system and this manual page were written by
526.An Jonathan Lemon Aq jlemon@FreeBSD.org .
527.Sh BUGS
528It is currently not possible to watch FIFOs, AIO, or a vnode that
529resides on anything but a UFS file system.
530.Pp
531The
532.Fa timeout
533value is limited to 24 hours; longer timeouts will be silently
534reinterpreted as 24 hours.
535