1.\" Copyright (c) 2002 Luigi Rizzo 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD: src/share/man/man4/polling.4,v 1.27 2007/04/06 14:25:14 brueffer Exp $ 26.\" $DragonFly: src/share/man/man4/polling.4,v 1.13 2007/11/03 07:35:52 swildner Exp $ 27.\" 28.Dd May 23, 2013 29.Dt POLLING 4 30.Os 31.Sh NAME 32.Nm polling 33.Nd network device driver polling support 34.Sh SYNOPSIS 35.Cd "options IFPOLL_ENABLE" 36.Sh DESCRIPTION 37Network device polling 38.Nm ( 39for brevity) refers to a technique that 40lets the operating system periodically poll network devices, instead of 41relying on the network devices to generate interrupts when they need attention. 42This might seem inefficient and counterintuitive, but when done 43properly, 44.Nm 45gives more control to the operating system on 46when and how to handle network devices, with a number of advantages in terms 47of system responsiveness and performance. 48.Pp 49In particular, 50.Nm 51reduces the overhead for context 52switches which is incurred when servicing interrupts, and 53gives more control on the scheduling of a CPU between various 54tasks (user processes, software interrupts, device handling) 55which ultimately reduces the chances of livelock in the system. 56.Ss Principles of Operation 57In the normal, interrupt-based mode, network devices generate an interrupt 58whenever they need attention. 59This in turn causes a 60context switch and the execution of an interrupt handler 61which performs whatever processing is needed by the network device. 62The duration of the interrupt handler is potentially unbounded 63unless the network device driver has been programmed with real-time 64concerns in mind (which is generally not the case for 65.Dx 66drivers). 67Furthermore, under heavy traffic load, the system might be 68persistently processing interrupts without being able to 69complete other work, either in the kernel or in userland. 70.Pp 71Network device polling disables interrupts by polling network devices on 72clock interrupts. 73This way, the context switch overhead is removed. 74Furthermore, 75the operating system can control accurately how much work to spend 76in handling network device events, and thus prevent livelock by reserving 77some amount of CPU to other tasks. 78.Pp 79Enabling 80.Nm 81also changes the way software network interrupts 82are scheduled, so there is never the risk of livelock because 83packets are not processed to completion. 84.Ss Enabling polling 85It is turned on and off with help of 86.Xr ifconfig 8 87command. 88An interface does not have to be 89.Dq up 90in order to turn on its 91.Nm 92feature. 93.Ss Loader Tunables 94The following tunables can be set from 95.Xr loader.conf 5 96.Em ( X 97is the CPU number): 98.Bl -tag -width indent -compact 99.It Va net.ifpoll.burst_max 100Default value for 101.Va net.ifpoll.X.rx.burst_max 102sysctl nodes. 103.Pp 104.It Va net.ifpoll.each_burst 105Default value for 106.Va net.ifpoll.X.rx.each_burst 107sysctl nodes. 108.Pp 109.It Va net.ifpoll.user_frac 110Default value for 111.Va net.ifpoll.X.rx.user_frac 112sysctl nodes. 113.Pp 114.It Va net.ifpoll.pollhz 115Default value for 116.Va net.ifpoll.X.pollhz 117sysctl nodes. 118.Pp 119.It Va net.ifpoll.status_frac 120Default value for 121.Va net.ifpoll.0.status_frac 122sysctl node. 123.Pp 124.It Va net.ifpoll.tx_frac 125Default value for 126.Va net.ifpoll.X.tx_frac 127sysctl nodes. 128.El 129.Ss MIB Variables 130The operation of 131.Nm 132is controlled by the following per CPU 133.Xr sysctl 8 134MIB variables 135.Em ( X 136is the CPU number): 137.Pp 138.Bl -tag -width indent -compact 139.It Va net.ifpoll.X.pollhz 140The polling frequency, whose range is 1 to 30000. 141Default is 6000. 142.Pp 143.It Va net.ifpoll.X.rx.user_frac 144When 145.Nm 146is enabled, and provided that there is some work to do, 147up to this percent of the CPU cycles is reserved to userland tasks, 148the remaining fraction being available for 149.Nm 150processing. 151Default is 50. 152.Pp 153.It Va net.ifpoll.X.rx.burst 154Maximum number of packets grabbed from each network interface in 155each timer tick. 156This number is dynamically adjusted by the kernel, 157according to the programmed 158.Va user_frac , burst_max , 159CPU speed, and system load. 160.Pp 161.It Va net.ifpoll.X.rx.each_burst 162The burst above is split into smaller chunks of this number of 163packets, going round-robin among all interfaces registered for 164.Nm . 165This prevents the case that a large burst from a single interface 166can saturate the IP interrupt queue. 167Default is 50. 168.Pp 169.It Va net.ifpoll.X.rx.burst_max 170Upper bound for 171.Va net.ifpoll.X.rx.burst . 172Note that when 173.Nm 174is enabled, each interface can receive at most 175.Pq Va pollhz No * Va burst_max 176packets per second unless there are spare CPU cycles available for 177.Nm 178in the idle loop. 179This number should be tuned to match the expected load. 180Default is 250 which is adequate for 1000Mbit network and pollhz=6000. 181.Pp 182.It Va net.ifpoll.X.rx.handlers 183How many active network devices have registered for packet reception 184.Nm . 185.Pp 186.It Va net.ifpoll.X.tx_frac 187Controls how often (every 188.Va tx_frac No / Va pollhz 189seconds) the tranmission queue is checked for packet transmission 190done events. 191Increasing this value reduces the time spent on checking packets 192transmission done events thus reduces bus load, 193but it also increases chance 194that the transmission queue getting saturated. 195Default is 1. 196.Pp 197.It Va net.ifpoll.X.tx.handlers 198How many active network devices have registered for packet transmission 199.Nm . 200.Pp 201.It Va net.ifpoll.0.status_frac 202Controls how often (every 203.Va status_frac No / Va pollhz 204seconds) the status registers of the network device are checked for error 205conditions and the like. 206Increasing this value reduces the load on the bus, 207but also delays the error detection. 208Default is 120. 209.Pp 210.It Va net.ifpoll.0.status.handlers 211How many active network devices have registered for status 212.Nm . 213.Pp 214.It Va net.ifpoll.X.rx.short_ticks 215.It Va net.ifpoll.X.rx.lost_polls 216.It Va net.ifpoll.X.rx.pending_polls 217.It Va net.ifpoll.X.rx.residual_burst 218.It Va net.ifpoll.X.rx.phase 219.It Va net.ifpoll.X.rx.suspect 220.It Va net.ifpoll.X.rx.stalled 221.It Va net.ifpoll.X.tx.short_ticks 222.It Va net.ifpoll.X.tx.lost_polls 223.It Va net.ifpoll.X.tx.pending_polls 224.It Va net.ifpoll.X.tx.residual_burst 225.It Va net.ifpoll.X.tx.phase 226.It Va net.ifpoll.X.tx.suspect 227.It Va net.ifpoll.X.tx.stalled 228Debugging variables. 229.El 230.Sh SUPPORTED DEVICES 231Network device polling requires explicit modifications to 232the network device drivers. 233As of this writing, the 234.Xr bce 4 , 235.Xr bge 4 , 236.Xr bnx 4 , 237.Xr dc 4 , 238.Xr em 4 , 239.Xr emx 4 , 240.Xr fwe 4 , 241.Xr fxp 4 , 242.Xr igb 4 , 243.Xr jme 4 , 244.Xr nfe 4 , 245.Xr nge 4 , 246.Xr re 4 , 247.Xr rl 4 , 248.Xr sis 4 , 249.Xr stge 4 , 250.Xr vge 4 , 251.Xr vr 4 , 252and 253.Xr xl 4 254devices are supported, 255with others in the works. 256The 257.Xr bce 4 , 258.Xr bnx 4 , 259.Xr emx 4 , 260.Xr igb 4 , 261and 262.Xr jme 4 263support multiple reception queues based 264.Nm . 265The 266.Xr bce 4 , 267.Xr bnx 4 , 268certain types of 269.Xr emx 4 , 270and 271.Xr igb 4 272support multiple transmission queues based 273.Nm . 274The modifications are rather straightforward, consisting in 275the extraction of the inner part of the interrupt service routine 276and writing a callback function, 277.Fn *_npoll , 278which is invoked 279to probe the network device for events and process them. 280(See the 281conditionally compiled sections of the network devices mentioned above 282for more details.) 283.Pp 284In order to reduce the latency in processing packets, 285it is advisable to set the 286.Xr sysctl 8 287variable 288.Va net.ifpoll.X.pollhz 289to at least 1000. 290.Sh HISTORY 291Network device polling first appeared in 292.Fx 4.6 . 293It was rewritten in 294.Dx 1.3 . 295.Sh AUTHORS 296.An -nosplit 297The network device polling code was rewritten by 298.An Matt Dillon 299based on the original code by 300.An Luigi Rizzo Aq luigi@iet.unipi.it . 301.An Sepherosa Ziehau 302made the polling frequency settable at runtime, 303added per CPU polling 304and added multiple reception and tranmission queue polling support. 305