xref: /dragonfly/share/man/man4/polling.4 (revision e8c03636)
1.\" Copyright (c) 2002 Luigi Rizzo
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD: src/share/man/man4/polling.4,v 1.27 2007/04/06 14:25:14 brueffer Exp $
26.\" $DragonFly: src/share/man/man4/polling.4,v 1.13 2007/11/03 07:35:52 swildner Exp $
27.\"
28.Dd May 23, 2013
29.Dt POLLING 4
30.Os
31.Sh NAME
32.Nm polling
33.Nd network device driver polling support
34.Sh SYNOPSIS
35.Cd "options IFPOLL_ENABLE"
36.Sh DESCRIPTION
37Network device polling
38.Nm (
39for brevity) refers to a technique that
40lets the operating system periodically poll network devices, instead of
41relying on the network devices to generate interrupts when they need attention.
42This might seem inefficient and counterintuitive, but when done
43properly,
44.Nm
45gives more control to the operating system on
46when and how to handle network devices, with a number of advantages in terms
47of system responsiveness and performance.
48.Pp
49In particular,
50.Nm
51reduces the overhead for context
52switches which is incurred when servicing interrupts, and
53gives more control on the scheduling of a CPU between various
54tasks (user processes, software interrupts, device handling)
55which ultimately reduces the chances of livelock in the system.
56.Ss Principles of Operation
57In the normal, interrupt-based mode, network devices generate an interrupt
58whenever they need attention.
59This in turn causes a
60context switch and the execution of an interrupt handler
61which performs whatever processing is needed by the network device.
62The duration of the interrupt handler is potentially unbounded
63unless the network device driver has been programmed with real-time
64concerns in mind (which is generally not the case for
65.Dx
66drivers).
67Furthermore, under heavy traffic load, the system might be
68persistently processing interrupts without being able to
69complete other work, either in the kernel or in userland.
70.Pp
71Network device polling disables interrupts by polling network devices on
72clock interrupts.
73This way, the context switch overhead is removed.
74Furthermore,
75the operating system can control accurately how much work to spend
76in handling network device events, and thus prevent livelock by reserving
77some amount of CPU to other tasks.
78.Pp
79Enabling
80.Nm
81also changes the way software network interrupts
82are scheduled, so there is never the risk of livelock because
83packets are not processed to completion.
84.Ss Enabling polling
85It is turned on and off with help of
86.Xr ifconfig 8
87command.
88An interface does not have to be
89.Dq up
90in order to turn on its
91.Nm
92feature.
93.Ss Loader Tunables
94The following tunables can be set from
95.Xr loader.conf 5
96.Em ( X
97is the CPU number):
98.Bl -tag -width indent -compact
99.It Va net.ifpoll.burst_max
100Default value for
101.Va net.ifpoll.X.rx.burst_max
102sysctl nodes.
103.Pp
104.It Va net.ifpoll.each_burst
105Default value for
106.Va net.ifpoll.X.rx.each_burst
107sysctl nodes.
108.Pp
109.It Va net.ifpoll.user_frac
110Default value for
111.Va net.ifpoll.X.rx.user_frac
112sysctl nodes.
113.Pp
114.It Va net.ifpoll.pollhz
115Default value for
116.Va net.ifpoll.X.pollhz
117sysctl nodes.
118.Pp
119.It Va net.ifpoll.status_frac
120Default value for
121.Va net.ifpoll.0.status_frac
122sysctl node.
123.Pp
124.It Va net.ifpoll.tx_frac
125Default value for
126.Va net.ifpoll.X.tx_frac
127sysctl nodes.
128.El
129.Ss MIB Variables
130The operation of
131.Nm
132is controlled by the following per CPU
133.Xr sysctl 8
134MIB variables
135.Em ( X
136is the CPU number):
137.Pp
138.Bl -tag -width indent -compact
139.It Va net.ifpoll.X.pollhz
140The polling frequency, whose range is 1 to 30000.
141Default is 6000.
142.Pp
143.It Va net.ifpoll.X.rx.user_frac
144When
145.Nm
146is enabled, and provided that there is some work to do,
147up to this percent of the CPU cycles is reserved to userland tasks,
148the remaining fraction being available for
149.Nm
150processing.
151Default is 50.
152.Pp
153.It Va net.ifpoll.X.rx.burst
154Maximum number of packets grabbed from each network interface in
155each timer tick.
156This number is dynamically adjusted by the kernel,
157according to the programmed
158.Va user_frac , burst_max ,
159CPU speed, and system load.
160.Pp
161.It Va net.ifpoll.X.rx.each_burst
162The burst above is split into smaller chunks of this number of
163packets, going round-robin among all interfaces registered for
164.Nm .
165This prevents the case that a large burst from a single interface
166can saturate the IP interrupt queue.
167Default is 50.
168.Pp
169.It Va net.ifpoll.X.rx.burst_max
170Upper bound for
171.Va net.ifpoll.X.rx.burst .
172Note that when
173.Nm
174is enabled, each interface can receive at most
175.Pq Va pollhz No * Va burst_max
176packets per second unless there are spare CPU cycles available for
177.Nm
178in the idle loop.
179This number should be tuned to match the expected load.
180Default is 250 which is adequate for 1000Mbit network and pollhz=6000.
181.Pp
182.It Va net.ifpoll.X.rx.handlers
183How many active network devices have registered for packet reception
184.Nm .
185.Pp
186.It Va net.ifpoll.X.tx_frac
187Controls how often (every
188.Va tx_frac No / Va pollhz
189seconds) the tranmission queue is checked for packet transmission
190done events.
191Increasing this value reduces the time spent on checking packets
192transmission done events thus reduces bus load,
193but it also increases chance
194that the transmission queue getting saturated.
195Default is 1.
196.Pp
197.It Va net.ifpoll.X.tx.handlers
198How many active network devices have registered for packet transmission
199.Nm .
200.Pp
201.It Va net.ifpoll.0.status_frac
202Controls how often (every
203.Va status_frac No / Va pollhz
204seconds) the status registers of the network device are checked for error
205conditions and the like.
206Increasing this value reduces the load on the bus,
207but also delays the error detection.
208Default is 120.
209.Pp
210.It Va net.ifpoll.0.status.handlers
211How many active network devices have registered for status
212.Nm .
213.Pp
214.It Va net.ifpoll.X.rx.short_ticks
215.It Va net.ifpoll.X.rx.lost_polls
216.It Va net.ifpoll.X.rx.pending_polls
217.It Va net.ifpoll.X.rx.residual_burst
218.It Va net.ifpoll.X.rx.phase
219.It Va net.ifpoll.X.rx.suspect
220.It Va net.ifpoll.X.rx.stalled
221.It Va net.ifpoll.X.tx.short_ticks
222.It Va net.ifpoll.X.tx.lost_polls
223.It Va net.ifpoll.X.tx.pending_polls
224.It Va net.ifpoll.X.tx.residual_burst
225.It Va net.ifpoll.X.tx.phase
226.It Va net.ifpoll.X.tx.suspect
227.It Va net.ifpoll.X.tx.stalled
228Debugging variables.
229.El
230.Sh SUPPORTED DEVICES
231Network device polling requires explicit modifications to
232the network device drivers.
233As of this writing, the
234.Xr bce 4 ,
235.Xr bge 4 ,
236.Xr bnx 4 ,
237.Xr dc 4 ,
238.Xr em 4 ,
239.Xr emx 4 ,
240.Xr fwe 4 ,
241.Xr fxp 4 ,
242.Xr igb 4 ,
243.Xr jme 4 ,
244.Xr nfe 4 ,
245.Xr nge 4 ,
246.Xr re 4 ,
247.Xr rl 4 ,
248.Xr sis 4 ,
249.Xr stge 4 ,
250.Xr vge 4 ,
251.Xr vr 4 ,
252and
253.Xr xl 4
254devices are supported,
255with others in the works.
256The
257.Xr bce 4 ,
258.Xr bnx 4 ,
259.Xr emx 4 ,
260.Xr igb 4 ,
261and
262.Xr jme 4
263support multiple reception queues based
264.Nm .
265The
266.Xr bce 4 ,
267.Xr bnx 4 ,
268certain types of
269.Xr emx 4 ,
270and
271.Xr igb 4
272support multiple transmission queues based
273.Nm .
274The modifications are rather straightforward, consisting in
275the extraction of the inner part of the interrupt service routine
276and writing a callback function,
277.Fn *_npoll ,
278which is invoked
279to probe the network device for events and process them.
280(See the
281conditionally compiled sections of the network devices mentioned above
282for more details.)
283.Pp
284In order to reduce the latency in processing packets,
285it is advisable to set the
286.Xr sysctl 8
287variable
288.Va net.ifpoll.X.pollhz
289to at least 1000.
290.Sh HISTORY
291Network device polling first appeared in
292.Fx 4.6 .
293It was rewritten in
294.Dx 1.3 .
295.Sh AUTHORS
296.An -nosplit
297The network device polling code was rewritten by
298.An Matt Dillon
299based on the original code by
300.An Luigi Rizzo Aq luigi@iet.unipi.it .
301.An Sepherosa Ziehau
302made the polling frequency settable at runtime,
303added per CPU polling
304and added multiple reception and tranmission queue polling support.
305