xref: /dragonfly/share/man/man4/polling.4 (revision b7367ef6)
1.\" Copyright (c) 2002 Luigi Rizzo
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD: src/share/man/man4/polling.4,v 1.27 2007/04/06 14:25:14 brueffer Exp $
26.\" $DragonFly: src/share/man/man4/polling.4,v 1.12 2007/10/03 09:55:25 sephe Exp $
27.\"
28.Dd October 2, 2007
29.Dt POLLING 4
30.Os
31.Sh NAME
32.Nm polling
33.Nd device polling support
34.Sh SYNOPSIS
35.Cd "options DEVICE_POLLING"
36.Sh DESCRIPTION
37Device polling
38.Nm (
39for brevity) refers to a technique that
40lets the operating system periodically poll devices, instead of
41relying on the devices to generate interrupts when they need attention.
42This might seem inefficient and counterintuitive, but when done
43properly,
44.Nm
45gives more control to the operating system on
46when and how to handle devices, with a number of advantages in terms
47of system responsiveness and performance.
48.Pp
49In particular,
50.Nm
51reduces the overhead for context
52switches which is incurred when servicing interrupts, and
53gives more control on the scheduling of a CPU between various
54tasks (user processes, software interrupts, device handling)
55which ultimately reduces the chances of livelock in the system.
56.Ss Principles of Operation
57In the normal, interrupt-based mode, devices generate an interrupt
58whenever they need attention.
59This in turn causes a
60context switch and the execution of an interrupt handler
61which performs whatever processing is needed by the device.
62The duration of the interrupt handler is potentially unbounded
63unless the device driver has been programmed with real-time
64concerns in mind (which is generally not the case for
65.Dx
66drivers).
67Furthermore, under heavy traffic load, the system might be
68persistently processing interrupts without being able to
69complete other work, either in the kernel or in userland.
70.Pp
71Device polling disables interrupts by polling devices on clock
72interrupts.
73This way, the context switch overhead is removed.
74Furthermore,
75the operating system can control accurately how much work to spend
76in handling device events, and thus prevent livelock by reserving
77some amount of CPU to other tasks.
78.Pp
79Enabling
80.Nm
81also changes the way software network interrupts
82are scheduled, so there is never the risk of livelock because
83packets are not processed to completion.
84.Ss Enabling polling
85Currently only network interface drivers support the
86.Nm
87feature.
88It is turned on and off with help of
89.Xr ifconfig 8
90command.
91.Ss Loader Tunables
92The following tunables can be set from
93.Xr loader.conf 5 :
94.Bl -tag -width indent -compact
95.It Va kern.polling.enable
96If set to non-zero,
97.Nm
98is enabled.
99Default is enabled.
100.Pp
101.It Va kern.polling.cpumask
102A bitmask that controls which CPUs support device polling.
103Default is 0xffffffff.
104.El
105.Ss MIB Variables
106The operation of
107.Nm
108is controlled by the following per CPU
109.Xr sysctl 8
110MIB variables
111.Em ( X
112is the CPU number):
113.Pp
114.Bl -tag -width indent -compact
115.It Va kern.polling.X.enable
116If set to non-zero,
117.Nm
118is enabled.
119Default is enabled.
120.Pp
121.It Va kern.polling.X.pollhz
122The polling frequency, whose range is 1 to 30000.
123Default is 2000.
124.Pp
125.It Va kern.polling.cpumask
126A read only bitmask of the CPUs that support device polling.
127.Pp
128.It Va kern.polling.defcpu
129The default CPU used to run device polling (read only).
130.Pp
131.It Va kern.polling.X.user_frac
132When
133.Nm
134is enabled, and provided that there is some work to do,
135up to this percent of the CPU cycles is reserved to userland tasks,
136the remaining fraction being available for
137.Nm
138processing.
139Default is 50.
140.Pp
141.It Va kern.polling.X.burst
142Maximum number of packets grabbed from each network interface in
143each timer tick.
144This number is dynamically adjusted by the kernel,
145according to the programmed
146.Va user_frac , burst_max ,
147CPU speed, and system load.
148.Pp
149.It Va kern.polling.X.each_burst
150The burst above is split into smaller chunks of this number of
151packets, going round-robin among all interfaces registered for
152.Nm .
153This prevents the case that a large burst from a single interface
154can saturate the IP interrupt queue
155.Pq Va net.inet.ip.intr_queue_maxlen .
156Default is 5.
157.Pp
158.It Va kern.polling.X.burst_max
159Upper bound for
160.Va kern.polling.burst .
161Note that when
162.Nm
163is enabled, each interface can receive at most
164.Pq Va pollhz No * Va burst_max
165packets per second unless there are spare CPU cycles available for
166.Nm
167in the idle loop.
168This number should be tuned to match the expected load
169(which can be quite high with GigE cards).
170Default is 150 which is adequate for 100Mbit network and pollhz=1000.
171.Pp
172.It Va kern.polling.X.reg_frac
173Controls how often (every
174.Va reg_frac No / Va pollhz
175seconds) the status registers of the device are checked for error
176conditions and the like.
177Increasing this value reduces the load on the bus, but also delays
178the error detection.
179Default is 20.
180.Pp
181.It Va kern.polling.X.handlers
182How many active devices have registered for
183.Nm .
184.Pp
185.It Va kern.polling.X.short_ticks
186.It Va kern.polling.X.lost_polls
187.It Va kern.polling.X.pending_polls
188.It Va kern.polling.X.residual_burst
189.It Va kern.polling.X.phase
190.It Va kern.polling.X.suspect
191.It Va kern.polling.X.stalled
192Debugging variables.
193.El
194.Sh SUPPORTED DEVICES
195Device polling requires explicit modifications to the device drivers.
196As of this writing, the
197.Xr bce 4 ,
198.Xr bge 4 ,
199.Xr dc 4 ,
200.Xr em 4 ,
201.Xr fwe 4 ,
202.Xr fxp 4 ,
203.Xr nfe 4 ,
204.Xr nge 4 ,
205.Xr re 4 ,
206.Xr rl 4 ,
207.Xr sis 4 ,
208.Xr stge 4 ,
209.Xr vge 4 ,
210.Xr vr 4 ,
211.Xr wi 4
212and
213.Xr xl 4
214devices are supported, with others in the works.
215The modifications are rather straightforward, consisting in
216the extraction of the inner part of the interrupt service routine
217and writing a callback function,
218.Fn *_poll ,
219which is invoked
220to probe the device for events and process them.
221(See the
222conditionally compiled sections of the devices mentioned above
223for more details.)
224.Pp
225In order to reduce the latency in processing packets,
226it is advisable to set the
227.Xr sysctl 8
228variable
229.Va kern.polling.X.pollhz
230to at least 1000.
231.Sh HISTORY
232Device polling first appeared in
233.Fx 4.6 .
234It was rewritten in
235.Dx 1.3 .
236.Sh AUTHORS
237.An -nosplit
238The device polling code was rewritten by
239.An Matt Dillon
240based on the original code by
241.An Luigi Rizzo Aq luigi@iet.unipi.it .
242.An Sepherosa Ziehau
243made the polling frequency settable at runtime and added per CPU polling.
244