xref: /minix/lib/libc/sys/kqueue.2 (revision 00b67f09)
1.\"	$NetBSD: kqueue.2,v 1.34 2015/03/02 19:24:19 christos Exp $
2.\"
3.\" Copyright (c) 2000 Jonathan Lemon
4.\" All rights reserved.
5.\"
6.\" Copyright (c) 2001, 2002, 2003 The NetBSD Foundation, Inc.
7.\" All rights reserved.
8.\"
9.\" Portions of this documentation is derived from text contributed by
10.\" Luke Mewburn.
11.\"
12.\" Redistribution and use in source and binary forms, with or without
13.\" modification, are permitted provided that the following conditions
14.\" are met:
15.\" 1. Redistributions of source code must retain the above copyright
16.\"    notice, this list of conditions and the following disclaimer.
17.\" 2. Redistributions in binary form must reproduce the above copyright
18.\"    notice, this list of conditions and the following disclaimer in the
19.\"    documentation and/or other materials provided with the distribution.
20.\"
21.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND
22.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
23.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
24.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
25.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
27.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
28.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
29.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
30.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31.\" SUCH DAMAGE.
32.\"
33.\" $FreeBSD: src/lib/libc/sys/kqueue.2,v 1.22 2001/06/27 19:55:57 dd Exp $
34.\"
35.Dd March 2, 2015
36.Dt KQUEUE 2
37.Os
38.Sh NAME
39.Nm kqueue ,
40.Nm kqueue1 ,
41.Nm kevent
42.Nd kernel event notification mechanism
43.Sh LIBRARY
44.Lb libc
45.Sh SYNOPSIS
46.In sys/event.h
47.In sys/time.h
48.Ft int
49.Fn kqueue "void"
50.Ft int
51.Fn kqueue1 "int flags"
52.Ft int
53.Fn kevent "int kq" "const struct kevent *changelist" "size_t nchanges" "struct kevent *eventlist" "size_t nevents" "const struct timespec *timeout"
54.Fn EV_SET "\*[Am]kev" ident filter flags fflags data udata
55.Sh DESCRIPTION
56.Fn kqueue
57provides a generic method of notifying the user when an event
58happens or a condition holds, based on the results of small
59pieces of kernel code termed filters.
60A kevent is identified by the (ident, filter) pair; there may only
61be one unique kevent per kqueue.
62.Pp
63The filter is executed upon the initial registration of a kevent
64in order to detect whether a preexisting condition is present, and is also
65executed whenever an event is passed to the filter for evaluation.
66If the filter determines that the condition should be reported,
67then the kevent is placed on the kqueue for the user to retrieve.
68.Pp
69The filter is also run when the user attempts to retrieve the kevent
70from the kqueue.
71If the filter indicates that the condition that triggered
72the event no longer holds, the kevent is removed from the kqueue and
73is not returned.
74.Pp
75Multiple events which trigger the filter do not result in multiple
76kevents being placed on the kqueue; instead, the filter will aggregate
77the events into a single struct kevent.
78Calling
79.Fn close
80on a file descriptor will remove any kevents that reference the descriptor.
81.Pp
82.Fn kqueue
83creates a new kernel event queue and returns a descriptor.
84.Pp
85The
86.Fn kqueue1
87also allows to set the following
88.Fa flags
89on the returned file descriptor:
90.Bl -column O_NONBLOCK -offset indent
91.It Dv O_CLOEXEC
92Set the close on exec property.
93.It Dv O_NONBLOCK
94Sets non-blocking I/O.
95.It Dv O_NOSIGPIPE
96Return
97.Er EPIPE
98instead of raising
99.Dv SIGPIPE .
100.El
101The queue is not inherited by a child created with
102.Xr fork 2 .
103.\" However, if
104.\" .Xr rfork 2
105.\" is called without the
106.\" .Dv RFFDG
107.\" flag, then the descriptor table is shared,
108.\" which will allow sharing of the kqueue between two processes.
109.Pp
110.Fn kevent
111is used to register events with the queue, and return any pending
112events to the user.
113.Fa changelist
114is a pointer to an array of
115.Va kevent
116structures, as defined in
117.In sys/event.h .
118All changes contained in the
119.Fa changelist
120are applied before any pending events are read from the queue.
121.Fa nchanges
122gives the size of
123.Fa changelist .
124.Fa eventlist
125is a pointer to an array of kevent structures.
126.Fa nevents
127determines the size of
128.Fa eventlist .
129If
130.Fa timeout
131is a
132.No non- Ns Dv NULL
133pointer, it specifies a maximum interval to wait
134for an event, which will be interpreted as a struct timespec.
135If
136.Fa timeout
137is a
138.Dv NULL
139pointer,
140.Fn kevent
141waits indefinitely.
142To effect a poll, the
143.Fa timeout
144argument should be
145.No non- Ns Dv NULL ,
146pointing to a zero-valued
147.Va timespec
148structure.
149The same array may be used for the
150.Fa changelist
151and
152.Fa eventlist .
153.Pp
154.Fn EV_SET
155is a macro which is provided for ease of initializing a
156kevent structure.
157.Pp
158The
159.Va kevent
160structure is defined as:
161.Bd -literal
162struct kevent {
163	uintptr_t ident;	/* identifier for this event */
164	uint32_t  filter;	/* filter for event */
165	uint32_t  flags;	/* action flags for kqueue */
166	uint32_t  fflags;	/* filter flag value */
167	int64_t   data;		/* filter data value */
168	intptr_t  udata;	/* opaque user data identifier */
169};
170.Ed
171.Pp
172The fields of
173.Fa struct kevent
174are:
175.Bl -tag -width XXXfilter -offset indent
176.It ident
177Value used to identify this event.
178The exact interpretation is determined by the attached filter,
179but often is a file descriptor.
180.It filter
181Identifies the kernel filter used to process this event.
182There are pre-defined system filters (which are described below), and
183other filters may be added by kernel subsystems as necessary.
184.It flags
185Actions to perform on the event.
186.It fflags
187Filter-specific flags.
188.It data
189Filter-specific data value.
190.It udata
191Opaque user-defined value passed through the kernel unchanged.
192.El
193.Pp
194The
195.Va flags
196field can contain the following values:
197.Bl -tag -width XXXEV_ONESHOT -offset indent
198.It EV_ADD
199Adds the event to the kqueue.
200Re-adding an existing event will modify the parameters of the original
201event, and not result in a duplicate entry.
202Adding an event automatically enables it,
203unless overridden by the EV_DISABLE flag.
204.It EV_ENABLE
205Permit
206.Fn kevent
207to return the event if it is triggered.
208.It EV_DISABLE
209Disable the event so
210.Fn kevent
211will not return it.
212The filter itself is not disabled.
213.It EV_DELETE
214Removes the event from the kqueue.
215Events which are attached to file descriptors are automatically deleted
216on the last close of the descriptor.
217.It EV_ONESHOT
218Causes the event to return only the first occurrence of the filter
219being triggered.
220After the user retrieves the event from the kqueue, it is deleted.
221.It EV_CLEAR
222After the event is retrieved by the user, its state is reset.
223This is useful for filters which report state transitions
224instead of the current state.
225Note that some filters may automatically set this flag internally.
226.It EV_EOF
227Filters may set this flag to indicate filter-specific EOF condition.
228.It EV_ERROR
229See
230.Sx RETURN VALUES
231below.
232.El
233.Ss Filters
234Filters are identified by a number.
235There are two types of filters; pre-defined filters which
236are described below, and third-party filters that may be added with
237.Xr kfilter_register 9
238by kernel sub-systems, third-party device drivers, or loadable
239kernel modules.
240.Pp
241As a third-party filter is referenced by a well-known name instead
242of a statically assigned number, two
243.Xr ioctl 2 Ns s
244are supported on the file descriptor returned by
245.Fn kqueue
246to map a filter name to a filter number, and vice-versa (passing
247arguments in a structure described below):
248.Bl -tag -width KFILTER_BYFILTER -offset indent
249.It KFILTER_BYFILTER
250Map
251.Va filter
252to
253.Va name ,
254which is of size
255.Va len .
256.It KFILTER_BYNAME
257Map
258.Va name
259to
260.Va filter .
261.Va len
262is ignored.
263.El
264.Pp
265The following structure is used to pass arguments in and out of the
266.Xr ioctl 2 :
267.Bd -literal -offset indent
268struct kfilter_mapping {
269	char	 *name;		/* name to lookup or return */
270	size_t	 len;		/* length of name */
271	uint32_t filter;	/* filter to lookup or return */
272};
273.Ed
274.Pp
275Arguments may be passed to and from the filter via the
276.Va fflags
277and
278.Va data
279fields in the kevent structure.
280.Pp
281The predefined system filters are:
282.Bl -tag -width EVFILT_SIGNAL
283.It EVFILT_READ
284Takes a descriptor as the identifier, and returns whenever
285there is data available to read.
286The behavior of the filter is slightly different depending
287on the descriptor type.
288.Pp
289.Bl -tag -width 2n
290.It Sockets
291Sockets which have previously been passed to
292.Fn listen
293return when there is an incoming connection pending.
294.Va data
295contains the size of the listen backlog (i.e., the number of
296connections ready to be accepted with
297.Xr accept 2 . )
298.Pp
299Other socket descriptors return when there is data to be read,
300subject to the
301.Dv SO_RCVLOWAT
302value of the socket buffer.
303This may be overridden with a per-filter low water mark at the
304time the filter is added by setting the
305NOTE_LOWAT
306flag in
307.Va fflags ,
308and specifying the new low water mark in
309.Va data .
310On return,
311.Va data
312contains the number of bytes in the socket buffer.
313.Pp
314If the read direction of the socket has shutdown, then the filter
315also sets EV_EOF in
316.Va flags ,
317and returns the socket error (if any) in
318.Va fflags .
319It is possible for EOF to be returned (indicating the connection is gone)
320while there is still data pending in the socket buffer.
321.It Vnodes
322Returns when the file pointer is not at the end of file.
323.Va data
324contains the offset from current position to end of file,
325and may be negative.
326.It "Fifos, Pipes"
327Returns when there is data to read;
328.Va data
329contains the number of bytes available.
330.Pp
331When the last writer disconnects, the filter will set EV_EOF in
332.Va flags .
333This may be cleared by passing in EV_CLEAR, at which point the
334filter will resume waiting for data to become available before
335returning.
336.El
337.It EVFILT_WRITE
338Takes a descriptor as the identifier, and returns whenever
339it is possible to write to the descriptor.
340For sockets, pipes, fifos, and ttys,
341.Va data
342will contain the amount of space remaining in the write buffer.
343The filter will set EV_EOF when the reader disconnects, and for
344the fifo case, this may be cleared by use of EV_CLEAR.
345Note that this filter is not supported for vnodes.
346.Pp
347For sockets, the low water mark and socket error handling is
348identical to the EVFILT_READ case.
349.It EVFILT_AIO
350This is not implemented in
351.Nx .
352.ig
353The sigevent portion of the AIO request is filled in, with
354.Va sigev_notify_kqueue
355containing the descriptor of the kqueue that the event should
356be attached to,
357.Va sigev_value
358containing the udata value, and
359.Va sigev_notify
360set to SIGEV_EVENT.
361When the aio_* function is called, the event will be registered
362with the specified kqueue, and the
363.Va ident
364argument set to the
365.Fa struct aiocb
366returned by the aio_* function.
367The filter returns under the same conditions as aio_error.
368.Pp
369Alternatively, a kevent structure may be initialized, with
370.Va ident
371containing the descriptor of the kqueue, and the
372address of the kevent structure placed in the
373.Va aio_lio_opcode
374field of the AIO request.
375However, this approach will not work on
376architectures with 64-bit pointers, and should be considered deprecated.
377..
378.It EVFILT_VNODE
379Takes a file descriptor as the identifier and the events to watch for in
380.Va fflags ,
381and returns when one or more of the requested events occurs on the descriptor.
382The events to monitor are:
383.Bl -tag -width XXNOTE_RENAME
384.It NOTE_DELETE
385.Fn unlink
386was called on the file referenced by the descriptor.
387.It NOTE_WRITE
388A write occurred on the file referenced by the descriptor.
389.It NOTE_EXTEND
390The file referenced by the descriptor was extended.
391.It NOTE_ATTRIB
392The file referenced by the descriptor had its attributes changed.
393.It NOTE_LINK
394The link count on the file changed.
395.It NOTE_RENAME
396The file referenced by the descriptor was renamed.
397.It NOTE_REVOKE
398Access to the file was revoked via
399.Xr revoke 2
400or the underlying fileystem was unmounted.
401.El
402.Pp
403On return,
404.Va fflags
405contains the events which triggered the filter.
406.It EVFILT_PROC
407Takes the process ID to monitor as the identifier and the events to watch for
408in
409.Va fflags ,
410and returns when the process performs one or more of the requested events.
411If a process can normally see another process, it can attach an event to it.
412The events to monitor are:
413.Bl -tag -width XXNOTE_TRACKERR
414.It NOTE_EXIT
415The process has exited.
416The exit code of the process is stored in
417.Va data .
418.It NOTE_FORK
419The process has called
420.Fn fork .
421.It NOTE_EXEC
422The process has executed a new process via
423.Xr execve 2
424or similar call.
425.It NOTE_TRACK
426Follow a process across
427.Fn fork
428calls.
429The parent process will return with NOTE_TRACK set in the
430.Va fflags
431field, while the child process will return with NOTE_CHILD set in
432.Va fflags
433and the parent PID in
434.Va data .
435.It NOTE_TRACKERR
436This flag is returned if the system was unable to attach an event to
437the child process, usually due to resource limitations.
438.El
439.Pp
440On return,
441.Va fflags
442contains the events which triggered the filter.
443.It EVFILT_SIGNAL
444Takes the signal number to monitor as the identifier and returns
445when the given signal is delivered to the current process.
446This coexists with the
447.Fn signal
448and
449.Fn sigaction
450facilities, and has a lower precedence.
451The filter will record
452all attempts to deliver a signal to a process, even if the signal has
453been marked as SIG_IGN.
454Event notification happens after normal signal delivery processing.
455.Va data
456returns the number of times the signal has occurred since the last call to
457.Fn kevent .
458This filter automatically sets the EV_CLEAR flag internally.
459.It EVFILT_TIMER
460Establishes an arbitrary timer identified by
461.Va ident .
462When adding a timer,
463.Va data
464specifies the timeout period in milliseconds.
465The timer will be periodic unless EV_ONESHOT is specified.
466On return,
467.Va data
468contains the number of times the timeout has expired since the last call to
469.Fn kevent .
470This filter automatically sets the EV_CLEAR flag internally.
471.El
472.Sh RETURN VALUES
473.Fn kqueue
474creates a new kernel event queue and returns a file descriptor.
475If there was an error creating the kernel event queue, a value of \-1 is
476returned and errno set.
477.Pp
478.Fn kevent
479returns the number of events placed in the
480.Fa eventlist ,
481up to the value given by
482.Fa nevents .
483If an error occurs while processing an element of the
484.Fa changelist
485and there is enough room in the
486.Fa eventlist ,
487then the event will be placed in the
488.Fa eventlist
489with
490.Dv EV_ERROR
491set in
492.Va flags
493and the system error in
494.Va data .
495Otherwise,
496.Dv \-1
497will be returned, and
498.Dv errno
499will be set to indicate the error condition.
500If the time limit expires, then
501.Fn kevent
502returns 0.
503.Sh EXAMPLES
504The following example program monitors a file (provided to it as the first
505argument) and prints information about some common events it receives
506notifications for:
507.Bd -literal -offset indent
508#include \*[Lt]sys/types.h\*[Gt]
509#include \*[Lt]sys/event.h\*[Gt]
510#include \*[Lt]sys/time.h\*[Gt]
511#include \*[Lt]stdio.h\*[Gt]
512#include \*[Lt]unistd.h\*[Gt]
513#include \*[Lt]stdlib.h\*[Gt]
514#include \*[Lt]fcntl.h\*[Gt]
515#include \*[Lt]err.h\*[Gt]
516
517int
518main(int argc, char *argv[])
519{
520        int fd, kq, nev;
521        struct kevent ev;
522        static const struct timespec tout = { 1, 0 };
523
524        if ((fd = open(argv[1], O_RDONLY)) == -1)
525                err(1, "Cannot open `%s'", argv[1]);
526
527        if ((kq = kqueue()) == -1)
528                err(1, "Cannot create kqueue");
529
530        EV_SET(\*[Am]ev, fd, EVFILT_VNODE, EV_ADD | EV_ENABLE | EV_CLEAR,
531            NOTE_DELETE|NOTE_WRITE|NOTE_EXTEND|NOTE_ATTRIB|NOTE_LINK|
532            NOTE_RENAME|NOTE_REVOKE, 0, 0);
533        if (kevent(kq, \*[Am]ev, 1, NULL, 0, \*[Am]tout) == -1)
534                err(1, "kevent");
535        for (;;) {
536                nev = kevent(kq, NULL, 0, \*[Am]ev, 1, \*[Am]tout);
537                if (nev == -1)
538                        err(1, "kevent");
539                if (nev == 0)
540                        continue;
541                if (ev.fflags \*[Am] NOTE_DELETE) {
542                        printf("deleted ");
543                        ev.fflags \*[Am]= ~NOTE_DELETE;
544                }
545                if (ev.fflags \*[Am] NOTE_WRITE) {
546                        printf("written ");
547                        ev.fflags \*[Am]= ~NOTE_WRITE;
548                }
549                if (ev.fflags \*[Am] NOTE_EXTEND) {
550                        printf("extended ");
551                        ev.fflags \*[Am]= ~NOTE_EXTEND;
552                }
553                if (ev.fflags \*[Am] NOTE_ATTRIB) {
554                        printf("chmod/chown/utimes ");
555                        ev.fflags \*[Am]= ~NOTE_ATTRIB;
556                }
557                if (ev.fflags \*[Am] NOTE_LINK) {
558                        printf("hardlinked ");
559                        ev.fflags \*[Am]= ~NOTE_LINK;
560                }
561                if (ev.fflags \*[Am] NOTE_RENAME) {
562                        printf("renamed ");
563                        ev.fflags \*[Am]= ~NOTE_RENAME;
564                }
565                if (ev.fflags \*[Am] NOTE_REVOKE) {
566                        printf("revoked ");
567                        ev.fflags \*[Am]= ~NOTE_REVOKE;
568                }
569                printf("\\n");
570                if (ev.fflags)
571                        warnx("unknown event 0x%x\\n", ev.fflags);
572        }
573}
574.Ed
575.Sh ERRORS
576The
577.Fn kqueue
578function fails if:
579.Bl -tag -width Er
580.It Bq Er EMFILE
581The per-process descriptor table is full.
582.It Bq Er ENFILE
583The system file table is full.
584.It Bq Er ENOMEM
585The kernel failed to allocate enough memory for the kernel queue.
586.El
587.Pp
588The
589.Fn kevent
590function fails if:
591.Bl -tag -width Er
592.It Bq Er EACCES
593The process does not have permission to register a filter.
594.It Bq Er EBADF
595The specified descriptor is invalid.
596.It Bq Er EFAULT
597There was an error reading or writing the
598.Va kevent
599structure.
600.It Bq Er EINTR
601A signal was delivered before the timeout expired and before any
602events were placed on the kqueue for return.
603.It Bq Er EINVAL
604The specified time limit or filter is invalid.
605.It Bq Er ENOENT
606The event could not be found to be modified or deleted.
607.It Bq Er ENOMEM
608No memory was available to register the event.
609.It Bq Er EOPNOTSUPP
610This type of file descriptor is not supported for
611.Fn kevent
612operations.
613.It Bq Er ESRCH
614The specified process to attach to does not exist.
615.El
616.Sh SEE ALSO
617.\" .Xr aio_error 2 ,
618.\" .Xr aio_read 2 ,
619.\" .Xr aio_return 2 ,
620.Xr ioctl 2 ,
621.Xr poll 2 ,
622.Xr read 2 ,
623.Xr select 2 ,
624.Xr sigaction 2 ,
625.Xr write 2 ,
626.Xr signal 3 ,
627.Xr kfilter_register 9 ,
628.Xr knote 9
629.Rs
630.%A Jonathan Lemon
631.%T "Kqueue: A Generic and Scalable Event Notification Facility"
632.%I USENIX Association
633.%B Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference
634.%D June 25-30, 2001
635.%U http://www.usenix.org/event/usenix01/freenix01/full_papers/lemon/lemon.pdf
636.Re
637.Sh HISTORY
638The
639.Fn kqueue
640and
641.Fn kevent
642functions first appeared in
643.Fx 4.1 ,
644and then in
645.Nx 2.0 .
646The
647.Fn kqueue1
648function first appeared in
649.Nx 6.0 .
650