xref: /dragonfly/share/man/man4/polling.4 (revision 25a2db75)
1.\" Copyright (c) 2002 Luigi Rizzo
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD: src/share/man/man4/polling.4,v 1.27 2007/04/06 14:25:14 brueffer Exp $
26.\" $DragonFly: src/share/man/man4/polling.4,v 1.13 2007/11/03 07:35:52 swildner Exp $
27.\"
28.Dd November 16, 2012
29.Dt POLLING 4
30.Os
31.Sh NAME
32.Nm polling
33.Nd network device driver polling support
34.Sh SYNOPSIS
35.Cd "options IFPOLL_ENABLE"
36.Sh DESCRIPTION
37Device polling
38.Nm (
39for brevity) refers to a technique that
40lets the operating system periodically poll devices, instead of
41relying on the devices to generate interrupts when they need attention.
42This might seem inefficient and counterintuitive, but when done
43properly,
44.Nm
45gives more control to the operating system on
46when and how to handle devices, with a number of advantages in terms
47of system responsiveness and performance.
48.Pp
49In particular,
50.Nm
51reduces the overhead for context
52switches which is incurred when servicing interrupts, and
53gives more control on the scheduling of a CPU between various
54tasks (user processes, software interrupts, device handling)
55which ultimately reduces the chances of livelock in the system.
56.Ss Principles of Operation
57In the normal, interrupt-based mode, devices generate an interrupt
58whenever they need attention.
59This in turn causes a
60context switch and the execution of an interrupt handler
61which performs whatever processing is needed by the device.
62The duration of the interrupt handler is potentially unbounded
63unless the device driver has been programmed with real-time
64concerns in mind (which is generally not the case for
65.Dx
66drivers).
67Furthermore, under heavy traffic load, the system might be
68persistently processing interrupts without being able to
69complete other work, either in the kernel or in userland.
70.Pp
71Device polling disables interrupts by polling devices on clock
72interrupts.
73This way, the context switch overhead is removed.
74Furthermore,
75the operating system can control accurately how much work to spend
76in handling device events, and thus prevent livelock by reserving
77some amount of CPU to other tasks.
78.Pp
79Enabling
80.Nm
81also changes the way software network interrupts
82are scheduled, so there is never the risk of livelock because
83packets are not processed to completion.
84.Ss Enabling polling
85Currently only network interface drivers support the
86.Nm
87feature.
88It is turned on and off with help of
89.Xr ifconfig 8
90command.
91An interface does not have to be
92.Dq up
93in order to turn on its
94.Nm
95feature.
96.Ss Loader Tunables
97The following tunables can be set from
98.Xr loader.conf 5
99.Em ( X
100is the CPU number):
101.Bl -tag -width indent -compact
102.It Va net.ifpoll.burst_max
103Default value for
104.Va net.ifpoll.X.rx.burst_max
105sysctl nodes.
106.Pp
107.It Va net.ifpoll.each_burst
108Default value for
109.Va net.ifpoll.X.rx.each_burst
110sysctl nodes.
111.Pp
112.It Va net.ifpoll.user_frac
113Default value for
114.Va net.ifpoll.X.rx.user_frac
115sysctl nodes.
116.Pp
117.It Va net.ifpoll.pollhz
118Default value for
119.Va net.ifpoll.X.pollhz
120sysctl nodes.
121.Pp
122.It Va net.ifpoll.status_frac
123Default value for
124.Va net.ifpoll.0.status_frac
125sysctl node.
126.Pp
127.It Va net.ifpoll.tx_frac
128Default value for
129.Va net.ifpoll.X.tx_frac
130sysctl nodes.
131.El
132.Ss MIB Variables
133The operation of
134.Nm
135is controlled by the following per CPU
136.Xr sysctl 8
137MIB variables
138.Em ( X
139is the CPU number):
140.Pp
141.Bl -tag -width indent -compact
142.It Va net.ifpoll.X.pollhz
143The polling frequency, whose range is 1 to 30000.
144Default is 6000.
145.Pp
146.It Va net.ifpoll.X.rx.user_frac
147When
148.Nm
149is enabled, and provided that there is some work to do,
150up to this percent of the CPU cycles is reserved to userland tasks,
151the remaining fraction being available for
152.Nm
153processing.
154Default is 50.
155.Pp
156.It Va net.ifpoll.X.rx.burst
157Maximum number of packets grabbed from each network interface in
158each timer tick.
159This number is dynamically adjusted by the kernel,
160according to the programmed
161.Va user_frac , burst_max ,
162CPU speed, and system load.
163.Pp
164.It Va net.ifpoll.X.rx.each_burst
165The burst above is split into smaller chunks of this number of
166packets, going round-robin among all interfaces registered for
167.Nm .
168This prevents the case that a large burst from a single interface
169can saturate the IP interrupt queue.
170Default is 50.
171.Pp
172.It Va net.ifpoll.X.rx.burst_max
173Upper bound for
174.Va net.ifpoll.X.rx.burst .
175Note that when
176.Nm
177is enabled, each interface can receive at most
178.Pq Va pollhz No * Va burst_max
179packets per second unless there are spare CPU cycles available for
180.Nm
181in the idle loop.
182This number should be tuned to match the expected load.
183Default is 250 which is adequate for 1000Mbit network and pollhz=6000.
184.Pp
185.It Va net.ifpoll.X.rx.handlers
186How many active devices have registered for packet reception
187.Nm .
188.Pp
189.It Va net.ifpoll.X.tx_frac
190Controls how often (every
191.Va tx_frac No / Va pollhz
192seconds) the tranmission queue is checked for packet transmission
193done events.
194Increasing this value reduces the time spent on checking packets
195transmission done events thus reduces bus load,
196but it also increases chance
197that the transmission queue getting saturated.
198Default is 1.
199.Pp
200.It Va net.ifpoll.X.tx.handlers
201How many active devices have registered for packet transmission
202.Nm .
203.Pp
204.It Va net.ifpoll.0.status_frac
205Controls how often (every
206.Va status_frac No / Va pollhz
207seconds) the status registers of the device are checked for error
208conditions and the like.
209Increasing this value reduces the load on the bus,
210but also delays the error detection.
211Default is 120.
212.Pp
213.It Va net.ifpoll.0.status.handlers
214How many active devices have registered for status
215.Nm .
216.Pp
217.It Va net.ifpoll.X.rx.short_ticks
218.It Va net.ifpoll.X.rx.lost_polls
219.It Va net.ifpoll.X.rx.pending_polls
220.It Va net.ifpoll.X.rx.residual_burst
221.It Va net.ifpoll.X.rx.phase
222.It Va net.ifpoll.X.rx.suspect
223.It Va net.ifpoll.X.rx.stalled
224.It Va net.ifpoll.X.tx.short_ticks
225.It Va net.ifpoll.X.tx.lost_polls
226.It Va net.ifpoll.X.tx.pending_polls
227.It Va net.ifpoll.X.tx.residual_burst
228.It Va net.ifpoll.X.tx.phase
229.It Va net.ifpoll.X.tx.suspect
230.It Va net.ifpoll.X.tx.stalled
231Debugging variables.
232.El
233.Sh SUPPORTED DEVICES
234Device polling requires explicit modifications to the device drivers.
235As of this writing, the
236.Xr bce 4 ,
237.Xr bge 4 ,
238.Xr bnx 4 ,
239.Xr dc 4 ,
240.Xr em 4 ,
241.Xr emx 4 ,
242.Xr fwe 4 ,
243.Xr fxp 4 ,
244.Xr igb 4 ,
245.Xr jme 4 ,
246.Xr nfe 4 ,
247.Xr nge 4 ,
248.Xr re 4 ,
249.Xr rl 4 ,
250.Xr sis 4 ,
251.Xr stge 4 ,
252.Xr vge 4 ,
253.Xr vr 4 ,
254and
255.Xr xl 4
256devices are supported,
257with others in the works.
258The
259.Xr emx 4 ,
260.Xr igb 4 ,
261and
262.Xr jme 4
263support multiple reception queues based
264.Nm .
265The modifications are rather straightforward, consisting in
266the extraction of the inner part of the interrupt service routine
267and writing a callback function,
268.Fn *_npoll ,
269which is invoked
270to probe the device for events and process them.
271(See the
272conditionally compiled sections of the devices mentioned above
273for more details.)
274.Pp
275In order to reduce the latency in processing packets,
276it is advisable to set the
277.Xr sysctl 8
278variable
279.Va net.ifpoll.X.pollhz
280to at least 1000.
281.Sh HISTORY
282Device polling first appeared in
283.Fx 4.6 .
284It was rewritten in
285.Dx 1.3 .
286.Sh AUTHORS
287.An -nosplit
288The device polling code was rewritten by
289.An Matt Dillon
290based on the original code by
291.An Luigi Rizzo Aq luigi@iet.unipi.it .
292.An Sepherosa Ziehau
293made the polling frequency settable at runtime,
294added per CPU polling
295and added multiple reception queue polling support.
296