1.\" Copyright (c) 2002 Luigi Rizzo 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD: src/share/man/man4/polling.4,v 1.27 2007/04/06 14:25:14 brueffer Exp $ 26.\" 27.Dd May 23, 2013 28.Dt POLLING 4 29.Os 30.Sh NAME 31.Nm polling 32.Nd network device driver polling support 33.Sh SYNOPSIS 34.Cd "options IFPOLL_ENABLE" 35.Sh DESCRIPTION 36Network device polling 37.Nm ( 38for brevity) refers to a technique that 39lets the operating system periodically poll network devices, instead of 40relying on the network devices to generate interrupts when they need attention. 41This might seem inefficient and counterintuitive, but when done 42properly, 43.Nm 44gives more control to the operating system on 45when and how to handle network devices, with a number of advantages in terms 46of system responsiveness and performance. 47.Pp 48In particular, 49.Nm 50reduces the overhead for context 51switches which is incurred when servicing interrupts, and 52gives more control on the scheduling of a CPU between various 53tasks (user processes, software interrupts, device handling) 54which ultimately reduces the chances of livelock in the system. 55.Ss Principles of Operation 56In the normal, interrupt-based mode, network devices generate an interrupt 57whenever they need attention. 58This in turn causes a 59context switch and the execution of an interrupt handler 60which performs whatever processing is needed by the network device. 61The duration of the interrupt handler is potentially unbounded 62unless the network device driver has been programmed with real-time 63concerns in mind (which is generally not the case for 64.Dx 65drivers). 66Furthermore, under heavy traffic load, the system might be 67persistently processing interrupts without being able to 68complete other work, either in the kernel or in userland. 69.Pp 70Network device polling disables interrupts by polling network devices on 71clock interrupts. 72This way, the context switch overhead is removed. 73Furthermore, 74the operating system can control accurately how much work to spend 75in handling network device events, and thus prevent livelock by reserving 76some amount of CPU to other tasks. 77.Pp 78Enabling 79.Nm 80also changes the way software network interrupts 81are scheduled, so there is never the risk of livelock because 82packets are not processed to completion. 83.Ss Enabling polling 84It is turned on and off with help of 85.Xr ifconfig 8 86command. 87An interface does not have to be 88.Dq up 89in order to turn on its 90.Nm 91feature. 92.Ss Loader Tunables 93The following tunables can be set from 94.Xr loader.conf 5 95.Em ( X 96is the CPU number): 97.Bl -tag -width indent -compact 98.It Va net.ifpoll.burst_max 99Default value for 100.Va net.ifpoll.X.rx.burst_max 101sysctl nodes. 102.Pp 103.It Va net.ifpoll.each_burst 104Default value for 105.Va net.ifpoll.X.rx.each_burst 106sysctl nodes. 107.Pp 108.It Va net.ifpoll.user_frac 109Default value for 110.Va net.ifpoll.X.rx.user_frac 111sysctl nodes. 112.Pp 113.It Va net.ifpoll.pollhz 114Default value for 115.Va net.ifpoll.X.pollhz 116sysctl nodes. 117.Pp 118.It Va net.ifpoll.status_frac 119Default value for 120.Va net.ifpoll.0.status_frac 121sysctl node. 122.Pp 123.It Va net.ifpoll.tx_frac 124Default value for 125.Va net.ifpoll.X.tx_frac 126sysctl nodes. 127.El 128.Ss MIB Variables 129The operation of 130.Nm 131is controlled by the following per CPU 132.Xr sysctl 8 133MIB variables 134.Em ( X 135is the CPU number): 136.Pp 137.Bl -tag -width indent -compact 138.It Va net.ifpoll.X.pollhz 139The polling frequency, whose range is 1 to 30000. 140Default is 6000. 141.Pp 142.It Va net.ifpoll.X.rx.user_frac 143When 144.Nm 145is enabled, and provided that there is some work to do, 146up to this percent of the CPU cycles is reserved to userland tasks, 147the remaining fraction being available for 148.Nm 149processing. 150Default is 50. 151.Pp 152.It Va net.ifpoll.X.rx.burst 153Maximum number of packets grabbed from each network interface in 154each timer tick. 155This number is dynamically adjusted by the kernel, 156according to the programmed 157.Va user_frac , burst_max , 158CPU speed, and system load. 159.Pp 160.It Va net.ifpoll.X.rx.each_burst 161The burst above is split into smaller chunks of this number of 162packets, going round-robin among all interfaces registered for 163.Nm . 164This prevents the case that a large burst from a single interface 165can saturate the IP interrupt queue. 166Default is 50. 167.Pp 168.It Va net.ifpoll.X.rx.burst_max 169Upper bound for 170.Va net.ifpoll.X.rx.burst . 171Note that when 172.Nm 173is enabled, each interface can receive at most 174.Pq Va pollhz No * Va burst_max 175packets per second unless there are spare CPU cycles available for 176.Nm 177in the idle loop. 178This number should be tuned to match the expected load. 179Default is 250 which is adequate for 1000Mbit network and pollhz=6000. 180.Pp 181.It Va net.ifpoll.X.rx.handlers 182How many active network devices have registered for packet reception 183.Nm . 184.Pp 185.It Va net.ifpoll.X.tx_frac 186Controls how often (every 187.Va tx_frac No / Va pollhz 188seconds) the tranmission queue is checked for packet transmission 189done events. 190Increasing this value reduces the time spent on checking packets 191transmission done events thus reduces bus load, 192but it also increases chance 193that the transmission queue getting saturated. 194Default is 1. 195.Pp 196.It Va net.ifpoll.X.tx.handlers 197How many active network devices have registered for packet transmission 198.Nm . 199.Pp 200.It Va net.ifpoll.0.status_frac 201Controls how often (every 202.Va status_frac No / Va pollhz 203seconds) the status registers of the network device are checked for error 204conditions and the like. 205Increasing this value reduces the load on the bus, 206but also delays the error detection. 207Default is 120. 208.Pp 209.It Va net.ifpoll.0.status.handlers 210How many active network devices have registered for status 211.Nm . 212.Pp 213.It Va net.ifpoll.X.rx.short_ticks 214.It Va net.ifpoll.X.rx.lost_polls 215.It Va net.ifpoll.X.rx.pending_polls 216.It Va net.ifpoll.X.rx.residual_burst 217.It Va net.ifpoll.X.rx.phase 218.It Va net.ifpoll.X.rx.suspect 219.It Va net.ifpoll.X.rx.stalled 220.It Va net.ifpoll.X.tx.short_ticks 221.It Va net.ifpoll.X.tx.lost_polls 222.It Va net.ifpoll.X.tx.pending_polls 223.It Va net.ifpoll.X.tx.residual_burst 224.It Va net.ifpoll.X.tx.phase 225.It Va net.ifpoll.X.tx.suspect 226.It Va net.ifpoll.X.tx.stalled 227Debugging variables. 228.El 229.Sh SUPPORTED DEVICES 230Network device polling requires explicit modifications to 231the network device drivers. 232As of this writing, the 233.Xr bce 4 , 234.Xr bge 4 , 235.Xr bnx 4 , 236.Xr dc 4 , 237.Xr em 4 , 238.Xr emx 4 , 239.Xr fwe 4 , 240.Xr fxp 4 , 241.Xr igb 4 , 242.Xr ix 4 , 243.Xr jme 4 , 244.Xr mxge 4 , 245.Xr nfe 4 , 246.Xr nge 4 , 247.Xr re 4 , 248.Xr rl 4 , 249.Xr sis 4 , 250.Xr stge 4 , 251.Xr vge 4 , 252.Xr vr 4 , 253and 254.Xr xl 4 255devices are supported, 256with others in the works. 257The 258.Xr bce 4 , 259.Xr bnx 4 , 260.Xr emx 4 , 261.Xr igb 4 , 262.Xr ix 4 , 263.Xr jme 4 , 264and 265.Xr mxge 4 , 266support multiple reception queues based 267.Nm . 268The 269.Xr bce 4 , 270.Xr bnx 4 , 271certain types of 272.Xr emx 4 , 273.Xr igb 4 , 274and 275.Xr ix 4 276support multiple transmission queues based 277.Nm . 278The modifications are rather straightforward, consisting in 279the extraction of the inner part of the interrupt service routine 280and writing a callback function, 281.Fn *_npoll , 282which is invoked 283to probe the network device for events and process them. 284(See the 285conditionally compiled sections of the network devices mentioned above 286for more details.) 287.Pp 288In order to reduce the latency in processing packets, 289it is advisable to set the 290.Xr sysctl 8 291variable 292.Va net.ifpoll.X.pollhz 293to at least 1000. 294.Sh HISTORY 295Network device polling first appeared in 296.Fx 4.6 . 297It was rewritten in 298.Dx 1.3 . 299.Sh AUTHORS 300.An -nosplit 301The network device polling code was rewritten by 302.An Matt Dillon 303based on the original code by 304.An Luigi Rizzo Aq Mt luigi@iet.unipi.it . 305.An Sepherosa Ziehau 306made the polling frequency settable at runtime, 307added per CPU polling 308and added multiple reception and tranmission queue polling support. 309