1.\" $OpenBSD: kqueue.2,v 1.22 2007/05/31 19:19:32 jmc Exp $ 2.\" 3.\" Copyright (c) 2000 Jonathan Lemon 4.\" All rights reserved. 5.\" 6.\" Redistribution and use in source and binary forms, with or without 7.\" modification, are permitted provided that the following conditions 8.\" are met: 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in the 13.\" documentation and/or other materials provided with the distribution. 14.\" 15.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND 16.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 17.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 18.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 19.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 20.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 21.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 22.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 23.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 24.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 25.\" SUCH DAMAGE. 26.\" 27.\" $FreeBSD: src/lib/libc/sys/kqueue.2,v 1.18 2001/02/14 08:48:35 guido Exp $ 28.\" 29.Dd $Mdocdate: May 31 2007 $ 30.Dt KQUEUE 2 31.Os 32.Sh NAME 33.Nm kqueue , 34.Nm kevent 35.Nd kernel event notification mechanism 36.Sh SYNOPSIS 37.Fd #include <sys/types.h> 38.Fd #include <sys/event.h> 39.Fd #include <sys/time.h> 40.Ft int 41.Fn kqueue "void" 42.Ft int 43.Fn kevent "int kq" "const struct kevent *changelist" "int nchanges" "struct kevent *eventlist" "int nevents" "const struct timespec *timeout" 44.Fn EV_SET "&kev" ident filter flags fflags data udata 45.Sh DESCRIPTION 46.Fn kqueue 47provides a generic method of notifying the user when an event 48happens or a condition holds, based on the results of small 49pieces of kernel code termed 50.Dq filters . 51A kevent is identified by the (ident, filter) pair; there may only 52be one unique kevent per kqueue. 53.Pp 54The filter is executed upon the initial registration of a kevent 55in order to detect whether a preexisting condition is present, and is also 56executed whenever an event is passed to the filter for evaluation. 57If the filter determines that the condition should be reported, 58then the kevent is placed on the kqueue for the user to retrieve. 59.Pp 60The filter is also run when the user attempts to retrieve the kevent 61from the kqueue. 62If the filter indicates that the condition that triggered 63the event no longer holds, the kevent is removed from the kqueue and 64is not returned. 65.Pp 66Multiple events which trigger the filter do not result in multiple 67kevents being placed on the kqueue; instead, the filter will aggregate 68the events into a single 69.Li struct kevent . 70Calling 71.Fn close 72on a file descriptor will remove any kevents that reference the descriptor. 73.Pp 74.Fn kqueue 75creates a new kernel event queue and returns a descriptor. 76The queue is not inherited by a child created with 77.Xr fork 2 . 78However, if 79.Xr rfork 2 80is called without the 81.Dv RFFDG 82flag, then the descriptor table is shared, 83which will allow sharing of the kqueue between two processes. 84.Pp 85.Fn kevent 86is used to register events with the queue, and return any pending 87events to the user. 88.Fa changelist 89is a pointer to an array of 90.Va kevent 91structures, as defined in 92.Aq Pa sys/event.h . 93All changes contained in the 94.Fa changelist 95are applied before any pending events are read from the queue. 96.Fa nchanges 97gives the size of 98.Fa changelist . 99.Fa eventlist 100is a pointer to an array of kevent structures. 101.Fa nevents 102determines the size of 103.Fa eventlist . 104When 105.Fa nevents 106is zero, 107.Fn kevent 108will return immediately even if there is a 109.Fa timeout 110specified unlike 111.Xr select 2 . 112If 113.Fa timeout 114is a non-null pointer, it specifies a maximum interval to wait 115for an event, which will be interpreted as a 116.Li struct timespec . 117If 118.Fa timeout 119is a null pointer, 120.Fn kevent 121waits indefinitely. 122To effect a poll, the 123.Fa timeout 124argument should be non-null, pointing to a zero-valued 125.Va timespec 126structure. 127The same array may be used for the 128.Fa changelist 129and 130.Fa eventlist . 131.Pp 132.Fn EV_SET 133is a macro which is provided for ease of initializing a 134kevent structure. 135.Pp 136The 137.Va kevent 138structure is defined as: 139.Bd -literal 140struct kevent { 141 u_int ident; /* identifier for this event */ 142 short filter; /* filter for event */ 143 u_short flags; /* action flags for kqueue */ 144 u_int fflags; /* filter flag value */ 145 int data; /* filter data value */ 146 void *udata; /* opaque user data identifier */ 147}; 148.Ed 149.Pp 150The fields of 151.Li struct kevent 152are: 153.Bl -tag -width XXXfilter 154.It ident 155Value used to identify this event. 156The exact interpretation is determined by the attached filter, 157but often is a file descriptor. 158.It filter 159Identifies the kernel filter used to process this event. 160The pre-defined system filters are described below. 161.It flags 162Actions to perform on the event. 163.It fflags 164Filter-specific flags. 165.It data 166Filter-specific data value. 167.It udata 168Opaque user-defined value passed through the kernel unchanged. 169.El 170.Pp 171The 172.Va flags 173field can contain the following values: 174.Bl -tag -width XXXEV_ONESHOT 175.It Dv EV_ADD 176Adds the event to the kqueue. 177Re-adding an existing event will modify the parameters of the original event, 178and not result in a duplicate entry. 179Adding an event automatically enables it, unless overridden by the 180.Dv EV_DISABLE 181flag. 182.It Dv EV_ENABLE 183Permit 184.Fn kevent 185to return the event if it is triggered. 186.It Dv EV_DISABLE 187Disable the event so 188.Fn kevent 189will not return it. 190The filter itself is not disabled. 191.It Dv EV_DELETE 192Removes the event from the kqueue. 193Events which are attached to file descriptors are automatically deleted 194on the last close of the descriptor. 195.It Dv EV_ONESHOT 196Causes the event to return only the first occurrence of the filter 197being triggered. 198After the user retrieves the event from the kqueue, it is deleted. 199.It Dv EV_CLEAR 200After the event is retrieved by the user, its state is reset. 201This is useful for filters which report state transitions 202instead of the current state. 203Note that some filters may automatically set this flag internally. 204.It Dv EV_EOF 205Filters may set this flag to indicate filter-specific EOF condition. 206.It Dv EV_ERROR 207See 208.Sx RETURN VALUES 209below. 210.El 211.Pp 212The predefined system filters are listed below. 213Arguments may be passed to and from the filter via the 214.Va fflags 215and 216.Va data 217fields in the kevent structure. 218.Bl -tag -width EVFILT_SIGNAL 219.It Dv EVFILT_READ 220Takes a descriptor as the identifier, and returns whenever 221there is data available to read. 222The behavior of the filter is slightly different depending 223on the descriptor type. 224.Bl -tag -width 2n 225.It Sockets 226Sockets which have previously been passed to 227.Fn listen 228return when there is an incoming connection pending. 229.Va data 230contains the size of the listen backlog. 231.Pp 232Other socket descriptors return when there is data to be read, 233subject to the 234.Dv SO_RCVLOWAT 235value of the socket buffer. 236This may be overridden with a per-filter low water mark at the 237time the filter is added by setting the 238.Dv NOTE_LOWAT 239flag in 240.Va fflags , 241and specifying the new low water mark in 242.Va data . 243On return, 244.Va data 245contains the number of bytes in the socket buffer. 246.Pp 247If the read direction of the socket has shutdown, then the filter 248also sets 249.Dv EV_EOF 250in 251.Va flags , 252and returns the socket error (if any) in 253.Va fflags . 254It is possible for EOF to be returned (indicating the connection is gone) 255while there is still data pending in the socket buffer. 256.It Vnodes 257Returns when the file pointer is not at the end of file. 258.Va data 259contains the offset from current position to end of file, 260and may be negative. 261If 262.Dv NOTE_EOF 263is set in 264.Va fflags , 265.Fn kevent 266will also return when the file pointer is at the end of file. 267The end of file condition is indicated by the presence of 268.Dv NOTE_EOF 269in 270.Va fflags 271on return. 272.It "Fifos, Pipes" 273Returns when there is data to read; 274.Va data 275contains the number of bytes available. 276.Pp 277When the last writer disconnects, the filter will set 278.Dv EV_EOF 279in 280.Va flags . 281This may be cleared by passing in 282.Dv EV_CLEAR , 283at which point the filter will resume waiting for data to become 284available before returning. 285.It "BPF devices" 286Returns when the BPF buffer is full, the BPF timeout has expired, or 287when the BPF has 288.Dq immediate mode 289enabled and there is any data to read; 290.Va data 291contains the number of bytes available. 292.El 293.It Dv EVFILT_WRITE 294Takes a descriptor as the identifier, and returns whenever 295it is possible to write to the descriptor. 296For sockets, pipes, and FIFOs, 297.Va data 298will contain the amount of space remaining in the write buffer. 299The filter will set 300.Dv EV_EOF 301when the reader disconnects, and for the FIFO case, 302this may be cleared by use of 303.Dv EV_CLEAR . 304Note that this filter is not supported for vnodes or BPF devices. 305.Pp 306For sockets, the low water mark and socket error handling is 307identical to the 308.Dv EVFILT_READ 309case. 310.It Dv EVFILT_AIO 311The sigevent portion of the AIO request is filled in, with 312.Va sigev_notify_kqueue 313containing the descriptor of the kqueue that the event should 314be attached to, 315.Va sigev_value 316containing the udata value, and 317.Va sigev_notify 318set to 319.Dv SIGEV_KEVENT . 320When the aio_* function is called, the event will be registered 321with the specified kqueue, and the 322.Va ident 323argument set to the 324.Li struct aiocb 325returned by the aio_* function. 326The filter returns under the same conditions as aio_error. 327.Pp 328Alternatively, a kevent structure may be initialized, with 329.Va ident 330containing the descriptor of the kqueue, and the 331address of the kevent structure placed in the 332.Va aio_lio_opcode 333field of the AIO request. 334However, this approach will not work on architectures with 64-bit pointers, 335and should be considered deprecated. 336.It Dv EVFILT_VNODE 337Takes a file descriptor as the identifier and the events to watch for in 338.Va fflags , 339and returns when one or more of the requested events occurs on the descriptor. 340The events to monitor are: 341.Bl -tag -width XXNOTE_RENAME 342.It Dv NOTE_DELETE 343.Fn unlink 344was called on the file referenced by the descriptor. 345.It Dv NOTE_WRITE 346A write occurred on the file referenced by the descriptor. 347.It Dv NOTE_EXTEND 348The file referenced by the descriptor was extended. 349.It Dv NOTE_TRUNCATE 350The file referenced by the descriptor was truncated. 351.It Dv NOTE_ATTRIB 352The file referenced by the descriptor had its attributes changed. 353.It Dv NOTE_LINK 354The link count on the file changed. 355.It Dv NOTE_RENAME 356The file referenced by the descriptor was renamed. 357.It Dv NOTE_REVOKE 358Access to the file was revoked via 359.Xr revoke 2 360or the underlying file system was unmounted. 361.El 362.Pp 363On return, 364.Va fflags 365contains the events which triggered the filter. 366.It Dv EVFILT_PROC 367Takes the process ID to monitor as the identifier and the events to watch for 368in 369.Va fflags , 370and returns when the process performs one or more of the requested events. 371If a process can normally see another process, it can attach an event to it. 372The events to monitor are: 373.Bl -tag -width XXNOTE_TRACKERR 374.It Dv NOTE_EXIT 375The process has exited. 376.It Dv NOTE_FORK 377The process has called 378.Fn fork . 379.It Dv NOTE_EXEC 380The process has executed a new process via 381.Xr execve 2 382or similar call. 383.It Dv NOTE_TRACK 384Follow a process across 385.Fn fork 386calls. 387The parent process will return with 388.Dv NOTE_FORK 389set in the 390.Va fflags 391field, while the child process will return with 392.Dv NOTE_CHILD 393set in 394.Va fflags 395and the parent PID in 396.Va data . 397.It Dv NOTE_TRACKERR 398This flag is returned if the system was unable to attach an event to 399the child process, usually due to resource limitations. 400.El 401.Pp 402On return, 403.Va fflags 404contains the events which triggered the filter. 405.It Dv EVFILT_SIGNAL 406Takes the signal number to monitor as the identifier and returns 407when the given signal is delivered to the process. 408This coexists with the 409.Fn signal 410and 411.Fn sigaction 412facilities, and has a lower precedence. 413The filter will record all attempts to deliver a signal to a process, 414even if the signal has been marked as 415.Dv SIG_IGN . 416Event notification happens after normal signal delivery processing. 417.Va data 418returns the number of times the signal has occurred since the last call to 419.Fn kevent . 420This filter automatically sets the 421.Dv EV_CLEAR 422flag internally. 423.It Dv EVFILT_TIMER 424Establishes an arbitrary timer identified by 425.Va ident . 426When adding a timer, 427.Va data 428specifies the timeout period in milliseconds. 429The timer will be periodic unless 430.Dv EV_ONESHOT 431is specified. 432On return, 433.Va data 434contains the number of times the timeout has expired since the last call to 435.Fn kevent . 436This filter automatically sets the 437.Dv EV_CLEAR 438flag internally. 439.El 440.Sh RETURN VALUES 441.Fn kqueue 442creates a new kernel event queue and returns a file descriptor. 443If there was an error creating the kernel event queue, a value of -1 is 444returned and errno set. 445.Pp 446.Fn kevent 447returns the number of events placed in the 448.Fa eventlist , 449up to the value given by 450.Fa nevents . 451If an error occurs while processing an element of the 452.Fa changelist 453and there is enough room in the 454.Fa eventlist , 455then the event will be placed in the 456.Fa eventlist 457with 458.Dv EV_ERROR 459set in 460.Va flags 461and the system error in 462.Va data . 463Otherwise, 464.Dv -1 465will be returned, and 466.Dv errno 467will be set to indicate the error condition. 468If the time limit expires, then 469.Fn kevent 470returns 0. 471.Sh ERRORS 472The 473.Fn kqueue 474function fails if: 475.Bl -tag -width Er 476.It Bq Er ENOMEM 477The kernel failed to allocate enough memory for the kernel queue. 478.It Bq Er EMFILE 479The per-process descriptor table is full. 480.It Bq Er ENFILE 481The system file table is full. 482.El 483.Pp 484The 485.Fn kevent 486function fails if: 487.Bl -tag -width Er 488.It Bq Er EACCES 489The process does not have permission to register a filter. 490.It Bq Er EFAULT 491There was an error reading or writing the 492.Va kevent 493structure. 494.It Bq Er EBADF 495The specified descriptor is invalid. 496.It Bq Er EINTR 497A signal was delivered before the timeout expired and before any 498events were placed on the kqueue for return. 499.It Bq Er EINVAL 500The specified time limit or filter is invalid. 501.It Bq Er ENOENT 502The event could not be found to be modified or deleted. 503.It Bq Er ENOMEM 504No memory was available to register the event. 505.It Bq Er ESRCH 506The specified process to attach to does not exist. 507.El 508.Sh SEE ALSO 509.Xr poll 2 , 510.Xr read 2 , 511.Xr select 2 , 512.Xr sigaction 2 , 513.Xr write 2 , 514.Xr signal 3 515.Sh HISTORY 516The 517.Fn kqueue 518and 519.Fn kevent 520functions first appeared in 521.Fx 4.1 . 522.Sh AUTHORS 523The 524.Fn kqueue 525system and this manual page were written by 526.An Jonathan Lemon Aq jlemon@FreeBSD.org . 527.Sh BUGS 528It is currently not possible to watch FIFOs, AIO, or a vnode that 529resides on anything but a UFS file system. 530.Pp 531The 532.Fa timeout 533value is limited to 24 hours; longer timeouts will be silently 534reinterpreted as 24 hours. 535