1.\" Copyright (c) 2002 Luigi Rizzo 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD: src/share/man/man4/polling.4,v 1.27 2007/04/06 14:25:14 brueffer Exp $ 26.\" $DragonFly: src/share/man/man4/polling.4,v 1.13 2007/11/03 07:35:52 swildner Exp $ 27.\" 28.Dd November 16, 2012 29.Dt POLLING 4 30.Os 31.Sh NAME 32.Nm polling 33.Nd network device driver polling support 34.Sh SYNOPSIS 35.Cd "options IFPOLL_ENABLE" 36.Sh DESCRIPTION 37Device polling 38.Nm ( 39for brevity) refers to a technique that 40lets the operating system periodically poll devices, instead of 41relying on the devices to generate interrupts when they need attention. 42This might seem inefficient and counterintuitive, but when done 43properly, 44.Nm 45gives more control to the operating system on 46when and how to handle devices, with a number of advantages in terms 47of system responsiveness and performance. 48.Pp 49In particular, 50.Nm 51reduces the overhead for context 52switches which is incurred when servicing interrupts, and 53gives more control on the scheduling of a CPU between various 54tasks (user processes, software interrupts, device handling) 55which ultimately reduces the chances of livelock in the system. 56.Ss Principles of Operation 57In the normal, interrupt-based mode, devices generate an interrupt 58whenever they need attention. 59This in turn causes a 60context switch and the execution of an interrupt handler 61which performs whatever processing is needed by the device. 62The duration of the interrupt handler is potentially unbounded 63unless the device driver has been programmed with real-time 64concerns in mind (which is generally not the case for 65.Dx 66drivers). 67Furthermore, under heavy traffic load, the system might be 68persistently processing interrupts without being able to 69complete other work, either in the kernel or in userland. 70.Pp 71Device polling disables interrupts by polling devices on clock 72interrupts. 73This way, the context switch overhead is removed. 74Furthermore, 75the operating system can control accurately how much work to spend 76in handling device events, and thus prevent livelock by reserving 77some amount of CPU to other tasks. 78.Pp 79Enabling 80.Nm 81also changes the way software network interrupts 82are scheduled, so there is never the risk of livelock because 83packets are not processed to completion. 84.Ss Enabling polling 85Currently only network interface drivers support the 86.Nm 87feature. 88It is turned on and off with help of 89.Xr ifconfig 8 90command. 91An interface does not have to be 92.Dq up 93in order to turn on its 94.Nm 95feature. 96.Ss Loader Tunables 97The following tunables can be set from 98.Xr loader.conf 5 99.Em ( X 100is the CPU number): 101.Bl -tag -width indent -compact 102.It Va net.ifpoll.burst_max 103Default value for 104.Va net.ifpoll.X.rx.burst_max 105sysctl nodes. 106.Pp 107.It Va net.ifpoll.each_burst 108Default value for 109.Va net.ifpoll.X.rx.each_burst 110sysctl nodes. 111.Pp 112.It Va net.ifpoll.user_frac 113Default value for 114.Va net.ifpoll.X.rx.user_frac 115sysctl nodes. 116.Pp 117.It Va net.ifpoll.pollhz 118Default value for 119.Va net.ifpoll.X.pollhz 120sysctl nodes. 121.Pp 122.It Va net.ifpoll.status_frac 123Default value for 124.Va net.ifpoll.0.status_frac 125sysctl node. 126.Pp 127.It Va net.ifpoll.tx_frac 128Default value for 129.Va net.ifpoll.X.tx_frac 130sysctl nodes. 131.El 132.Ss MIB Variables 133The operation of 134.Nm 135is controlled by the following per CPU 136.Xr sysctl 8 137MIB variables 138.Em ( X 139is the CPU number): 140.Pp 141.Bl -tag -width indent -compact 142.It Va net.ifpoll.X.pollhz 143The polling frequency, whose range is 1 to 30000. 144Default is 6000. 145.Pp 146.It Va net.ifpoll.X.rx.user_frac 147When 148.Nm 149is enabled, and provided that there is some work to do, 150up to this percent of the CPU cycles is reserved to userland tasks, 151the remaining fraction being available for 152.Nm 153processing. 154Default is 50. 155.Pp 156.It Va net.ifpoll.X.rx.burst 157Maximum number of packets grabbed from each network interface in 158each timer tick. 159This number is dynamically adjusted by the kernel, 160according to the programmed 161.Va user_frac , burst_max , 162CPU speed, and system load. 163.Pp 164.It Va net.ifpoll.X.rx.each_burst 165The burst above is split into smaller chunks of this number of 166packets, going round-robin among all interfaces registered for 167.Nm . 168This prevents the case that a large burst from a single interface 169can saturate the IP interrupt queue. 170Default is 50. 171.Pp 172.It Va net.ifpoll.X.rx.burst_max 173Upper bound for 174.Va net.ifpoll.X.rx.burst . 175Note that when 176.Nm 177is enabled, each interface can receive at most 178.Pq Va pollhz No * Va burst_max 179packets per second unless there are spare CPU cycles available for 180.Nm 181in the idle loop. 182This number should be tuned to match the expected load. 183Default is 250 which is adequate for 1000Mbit network and pollhz=6000. 184.Pp 185.It Va net.ifpoll.X.rx.handlers 186How many active devices have registered for packet reception 187.Nm . 188.Pp 189.It Va net.ifpoll.X.tx_frac 190Controls how often (every 191.Va tx_frac No / Va pollhz 192seconds) the tranmission queue is checked for packet transmission 193done events. 194Increasing this value reduces the time spent on checking packets 195transmission done events thus reduces bus load, 196but it also increases chance 197that the transmission queue getting saturated. 198Default is 1. 199.Pp 200.It Va net.ifpoll.X.tx.handlers 201How many active devices have registered for packet transmission 202.Nm . 203.Pp 204.It Va net.ifpoll.0.status_frac 205Controls how often (every 206.Va status_frac No / Va pollhz 207seconds) the status registers of the device are checked for error 208conditions and the like. 209Increasing this value reduces the load on the bus, 210but also delays the error detection. 211Default is 120. 212.Pp 213.It Va net.ifpoll.0.status.handlers 214How many active devices have registered for status 215.Nm . 216.Pp 217.It Va net.ifpoll.X.rx.short_ticks 218.It Va net.ifpoll.X.rx.lost_polls 219.It Va net.ifpoll.X.rx.pending_polls 220.It Va net.ifpoll.X.rx.residual_burst 221.It Va net.ifpoll.X.rx.phase 222.It Va net.ifpoll.X.rx.suspect 223.It Va net.ifpoll.X.rx.stalled 224.It Va net.ifpoll.X.tx.short_ticks 225.It Va net.ifpoll.X.tx.lost_polls 226.It Va net.ifpoll.X.tx.pending_polls 227.It Va net.ifpoll.X.tx.residual_burst 228.It Va net.ifpoll.X.tx.phase 229.It Va net.ifpoll.X.tx.suspect 230.It Va net.ifpoll.X.tx.stalled 231Debugging variables. 232.El 233.Sh SUPPORTED DEVICES 234Device polling requires explicit modifications to the device drivers. 235As of this writing, the 236.Xr bce 4 , 237.Xr bge 4 , 238.Xr bnx 4 , 239.Xr dc 4 , 240.Xr em 4 , 241.Xr emx 4 , 242.Xr fwe 4 , 243.Xr fxp 4 , 244.Xr igb 4 , 245.Xr jme 4 , 246.Xr nfe 4 , 247.Xr nge 4 , 248.Xr re 4 , 249.Xr rl 4 , 250.Xr sis 4 , 251.Xr stge 4 , 252.Xr vge 4 , 253.Xr vr 4 , 254and 255.Xr xl 4 256devices are supported, 257with others in the works. 258The 259.Xr emx 4 , 260.Xr igb 4 , 261and 262.Xr jme 4 263support multiple reception queues based 264.Nm . 265The modifications are rather straightforward, consisting in 266the extraction of the inner part of the interrupt service routine 267and writing a callback function, 268.Fn *_npoll , 269which is invoked 270to probe the device for events and process them. 271(See the 272conditionally compiled sections of the devices mentioned above 273for more details.) 274.Pp 275In order to reduce the latency in processing packets, 276it is advisable to set the 277.Xr sysctl 8 278variable 279.Va net.ifpoll.X.pollhz 280to at least 1000. 281.Sh HISTORY 282Device polling first appeared in 283.Fx 4.6 . 284It was rewritten in 285.Dx 1.3 . 286.Sh AUTHORS 287.An -nosplit 288The device polling code was rewritten by 289.An Matt Dillon 290based on the original code by 291.An Luigi Rizzo Aq luigi@iet.unipi.it . 292.An Sepherosa Ziehau 293made the polling frequency settable at runtime, 294added per CPU polling 295and added multiple reception queue polling support. 296