1.\" Copyright (c) 2002 Luigi Rizzo 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD: src/share/man/man4/polling.4,v 1.27 2007/04/06 14:25:14 brueffer Exp $ 26.\" $DragonFly: src/share/man/man4/polling.4,v 1.13 2007/11/03 07:35:52 swildner Exp $ 27.\" 28.Dd October 2, 2007 29.Dt POLLING 4 30.Os 31.Sh NAME 32.Nm polling 33.Nd device polling support 34.Sh SYNOPSIS 35.Cd "options DEVICE_POLLING" 36.Sh DESCRIPTION 37Device polling 38.Nm ( 39for brevity) refers to a technique that 40lets the operating system periodically poll devices, instead of 41relying on the devices to generate interrupts when they need attention. 42This might seem inefficient and counterintuitive, but when done 43properly, 44.Nm 45gives more control to the operating system on 46when and how to handle devices, with a number of advantages in terms 47of system responsiveness and performance. 48.Pp 49In particular, 50.Nm 51reduces the overhead for context 52switches which is incurred when servicing interrupts, and 53gives more control on the scheduling of a CPU between various 54tasks (user processes, software interrupts, device handling) 55which ultimately reduces the chances of livelock in the system. 56.Ss Principles of Operation 57In the normal, interrupt-based mode, devices generate an interrupt 58whenever they need attention. 59This in turn causes a 60context switch and the execution of an interrupt handler 61which performs whatever processing is needed by the device. 62The duration of the interrupt handler is potentially unbounded 63unless the device driver has been programmed with real-time 64concerns in mind (which is generally not the case for 65.Dx 66drivers). 67Furthermore, under heavy traffic load, the system might be 68persistently processing interrupts without being able to 69complete other work, either in the kernel or in userland. 70.Pp 71Device polling disables interrupts by polling devices on clock 72interrupts. 73This way, the context switch overhead is removed. 74Furthermore, 75the operating system can control accurately how much work to spend 76in handling device events, and thus prevent livelock by reserving 77some amount of CPU to other tasks. 78.Pp 79Enabling 80.Nm 81also changes the way software network interrupts 82are scheduled, so there is never the risk of livelock because 83packets are not processed to completion. 84.Ss Enabling polling 85Currently only network interface drivers support the 86.Nm 87feature. 88It is turned on and off with help of 89.Xr ifconfig 8 90command. 91An interface does not have to be 92.Dq up 93in order to turn on its 94.Nm 95feature. 96.Ss Loader Tunables 97The following tunables can be set from 98.Xr loader.conf 5 : 99.Bl -tag -width indent -compact 100.It Va kern.polling.enable 101If set to non-zero, 102.Nm 103is enabled. 104Default is enabled. 105.Pp 106.It Va kern.polling.cpumask 107A bitmask that controls which CPUs support device polling. 108Default is 0xffffffff. 109.El 110.Ss MIB Variables 111The operation of 112.Nm 113is controlled by the following per CPU 114.Xr sysctl 8 115MIB variables 116.Em ( X 117is the CPU number): 118.Pp 119.Bl -tag -width indent -compact 120.It Va kern.polling.X.enable 121If set to non-zero, 122.Nm 123is enabled. 124Default is enabled. 125.Pp 126.It Va kern.polling.X.pollhz 127The polling frequency, whose range is 1 to 30000. 128Default is 2000. 129.Pp 130.It Va kern.polling.cpumask 131A read only bitmask of the CPUs that support device polling. 132.Pp 133.It Va kern.polling.defcpu 134The default CPU used to run device polling (read only). 135.Pp 136.It Va kern.polling.X.user_frac 137When 138.Nm 139is enabled, and provided that there is some work to do, 140up to this percent of the CPU cycles is reserved to userland tasks, 141the remaining fraction being available for 142.Nm 143processing. 144Default is 50. 145.Pp 146.It Va kern.polling.X.burst 147Maximum number of packets grabbed from each network interface in 148each timer tick. 149This number is dynamically adjusted by the kernel, 150according to the programmed 151.Va user_frac , burst_max , 152CPU speed, and system load. 153.Pp 154.It Va kern.polling.X.each_burst 155The burst above is split into smaller chunks of this number of 156packets, going round-robin among all interfaces registered for 157.Nm . 158This prevents the case that a large burst from a single interface 159can saturate the IP interrupt queue 160.Pq Va net.inet.ip.intr_queue_maxlen . 161Default is 5. 162.Pp 163.It Va kern.polling.X.burst_max 164Upper bound for 165.Va kern.polling.burst . 166Note that when 167.Nm 168is enabled, each interface can receive at most 169.Pq Va pollhz No * Va burst_max 170packets per second unless there are spare CPU cycles available for 171.Nm 172in the idle loop. 173This number should be tuned to match the expected load 174(which can be quite high with GigE cards). 175Default is 150 which is adequate for 100Mbit network and pollhz=1000. 176.Pp 177.It Va kern.polling.X.reg_frac 178Controls how often (every 179.Va reg_frac No / Va pollhz 180seconds) the status registers of the device are checked for error 181conditions and the like. 182Increasing this value reduces the load on the bus, but also delays 183the error detection. 184Default is 20. 185.Pp 186.It Va kern.polling.X.handlers 187How many active devices have registered for 188.Nm . 189.Pp 190.It Va kern.polling.X.short_ticks 191.It Va kern.polling.X.lost_polls 192.It Va kern.polling.X.pending_polls 193.It Va kern.polling.X.residual_burst 194.It Va kern.polling.X.phase 195.It Va kern.polling.X.suspect 196.It Va kern.polling.X.stalled 197Debugging variables. 198.El 199.Sh SUPPORTED DEVICES 200Device polling requires explicit modifications to the device drivers. 201As of this writing, the 202.Xr bce 4 , 203.Xr bge 4 , 204.Xr dc 4 , 205.Xr em 4 , 206.Xr fwe 4 , 207.Xr fxp 4 , 208.Xr jme 4 , 209.Xr nfe 4 , 210.Xr nge 4 , 211.Xr re 4 , 212.Xr rl 4 , 213.Xr sis 4 , 214.Xr stge 4 , 215.Xr vge 4 , 216.Xr vr 4 , 217.Xr wi 4 218and 219.Xr xl 4 220devices are supported, with others in the works. 221The modifications are rather straightforward, consisting in 222the extraction of the inner part of the interrupt service routine 223and writing a callback function, 224.Fn *_poll , 225which is invoked 226to probe the device for events and process them. 227(See the 228conditionally compiled sections of the devices mentioned above 229for more details.) 230.Pp 231In order to reduce the latency in processing packets, 232it is advisable to set the 233.Xr sysctl 8 234variable 235.Va kern.polling.X.pollhz 236to at least 1000. 237.Sh HISTORY 238Device polling first appeared in 239.Fx 4.6 . 240It was rewritten in 241.Dx 1.3 . 242.Sh AUTHORS 243.An -nosplit 244The device polling code was rewritten by 245.An Matt Dillon 246based on the original code by 247.An Luigi Rizzo Aq luigi@iet.unipi.it . 248.An Sepherosa Ziehau 249made the polling frequency settable at runtime and added per CPU polling. 250