xref: /freebsd/share/man/man4/polling.4 (revision f126890a)
1.\" Copyright (c) 2002 Luigi Rizzo
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.Dd December 26, 2020
26.Dt POLLING 4
27.Os
28.Sh NAME
29.Nm polling
30.Nd device polling support
31.Sh SYNOPSIS
32.Cd "options DEVICE_POLLING"
33.Sh DESCRIPTION
34Device polling
35.Nm (
36for brevity) refers to a technique that
37lets the operating system periodically poll devices, instead of
38relying on the devices to generate interrupts when they need attention.
39This might seem inefficient and counterintuitive, but when done
40properly,
41.Nm
42gives more control to the operating system on
43when and how to handle devices, with a number of advantages in terms
44of system responsiveness and performance.
45.Pp
46In particular,
47.Nm
48reduces the overhead for context
49switches which is incurred when servicing interrupts, and
50gives more control on the scheduling of the CPU between various
51tasks (user processes, software interrupts, device handling)
52which ultimately reduces the chances of livelock in the system.
53.Ss Principles of Operation
54In the normal, interrupt-based mode, devices generate an interrupt
55whenever they need attention.
56This in turn causes a
57context switch and the execution of an interrupt handler
58which performs whatever processing is needed by the device.
59The duration of the interrupt handler is potentially unbounded
60unless the device driver has been programmed with real-time
61concerns in mind (which is generally not the case for
62.Fx
63drivers).
64Furthermore, under heavy traffic load, the system might be
65persistently processing interrupts without being able to
66complete other work, either in the kernel or in userland.
67.Pp
68Device polling disables interrupts by polling devices at appropriate
69times, i.e., on clock interrupts and within the idle loop.
70This way, the context switch overhead is removed.
71Furthermore,
72the operating system can control accurately how much work to spend
73in handling device events, and thus prevent livelock by reserving
74some amount of CPU to other tasks.
75.Pp
76Enabling
77.Nm
78also changes the way software network interrupts
79are scheduled, so there is never the risk of livelock because
80packets are not processed to completion.
81.Ss Enabling polling
82Currently only network interface drivers support the
83.Nm
84feature.
85It is turned on and off with help of
86.Xr ifconfig 8
87command.
88.Pp
89The historic
90.Va kern.polling.enable ,
91which enabled polling for all interfaces, can be replaced with the following
92code:
93.Bd -literal
94for i in `ifconfig -l` ;
95  do ifconfig $i polling; # use -polling to disable
96done
97.Ed
98.Ss MIB Variables
99The operation of
100.Nm
101is controlled by the following
102.Xr sysctl 8
103MIB variables:
104.Pp
105.Bl -tag -width indent -compact
106.It Va kern.polling.user_frac
107When
108.Nm
109is enabled, and provided that there is some work to do,
110up to this percent of the CPU cycles is reserved to userland tasks,
111the remaining fraction being available for
112.Nm
113processing.
114Default is 50.
115.Pp
116.It Va kern.polling.burst
117Maximum number of packets grabbed from each network interface in
118each timer tick.
119This number is dynamically adjusted by the kernel,
120according to the programmed
121.Va user_frac , burst_max ,
122CPU speed, and system load.
123.Pp
124.It Va kern.polling.each_burst
125The burst above is split into smaller chunks of this number of
126packets, going round-robin among all interfaces registered for
127.Nm .
128This prevents the case that a large burst from a single interface
129can saturate the IP interrupt queue
130.Pq Va net.inet.ip.intr_queue_maxlen .
131Default is 5.
132.Pp
133.It Va kern.polling.burst_max
134Upper bound for
135.Va kern.polling.burst .
136Note that when
137.Nm
138is enabled, each interface can receive at most
139.Pq Va HZ No * Va burst_max
140packets per second unless there are spare CPU cycles available for
141.Nm
142in the idle loop.
143This number should be tuned to match the expected load
144(which can be quite high with GigE cards).
145Default is 150 which is adequate for 100Mbit network and HZ=1000.
146.Pp
147.It Va kern.polling.idle_poll
148Controls if
149.Nm
150is enabled in the idle loop.
151There are no reasons (other than power saving or bugs in the scheduler's
152handling of idle priority kernel threads) to disable this.
153.Pp
154.It Va kern.polling.reg_frac
155Controls how often (every
156.Va reg_frac No / Va HZ
157seconds) the status registers of the device are checked for error
158conditions and the like.
159Increasing this value reduces the load on the bus, but also delays
160the error detection.
161Default is 20.
162.Pp
163.It Va kern.polling.handlers
164How many active devices have registered for
165.Nm .
166.Pp
167.It Va kern.polling.short_ticks
168.It Va kern.polling.lost_polls
169.It Va kern.polling.pending_polls
170.It Va kern.polling.residual_burst
171.It Va kern.polling.phase
172.It Va kern.polling.suspect
173.It Va kern.polling.stalled
174Debugging variables.
175.El
176.Sh SUPPORTED DEVICES
177Device polling requires explicit modifications to the device drivers.
178As of this writing, the
179.Xr bge 4 ,
180.Xr dc 4 ,
181.Xr em 4 ,
182.Xr fwe 4 ,
183.Xr fwip 4 ,
184.Xr fxp 4 ,
185.Xr igb 4 ,
186.Xr nfe 4 ,
187.Xr nge 4 ,
188.Xr re 4 ,
189.Xr rl 4 ,
190.Xr sis 4 ,
191.Xr ste 4 ,
192.Xr stge 4 ,
193.Xr vge 4 ,
194.Xr vr 4 ,
195and
196.Xr xl 4
197devices are supported, with others in the works.
198The modifications are rather straightforward, consisting in
199the extraction of the inner part of the interrupt service routine
200and writing a callback function,
201.Fn *_poll ,
202which is invoked
203to probe the device for events and process them.
204(See the
205conditionally compiled sections of the devices mentioned above
206for more details.)
207.Pp
208As in the worst case the devices are only polled on clock interrupts,
209in order to reduce the latency in processing packets, it is not advisable
210to decrease the frequency of the clock below 1000 Hz.
211.Sh HISTORY
212Device polling first appeared in
213.Fx 4.6
214and
215.Fx 5.0 .
216.Sh AUTHORS
217Device polling was written by
218.An Luigi Rizzo Aq Mt luigi@iet.unipi.it .
219