xref: /openbsd/share/man/man9/sosplice.9 (revision cecf84d4)
1.\"	$OpenBSD: sosplice.9,v 1.7 2013/07/17 20:21:55 schwarze Exp $
2.\"
3.\" Copyright (c) 2011-2013 Alexander Bluhm <bluhm@openbsd.org>
4.\"
5.\" Permission to use, copy, modify, and distribute this software for any
6.\" purpose with or without fee is hereby granted, provided that the above
7.\" copyright notice and this permission notice appear in all copies.
8.\"
9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
16.\"
17.Dd $Mdocdate: July 17 2013 $
18.Dt SOSPLICE 9
19.Os
20.Sh NAME
21.Nm sosplice ,
22.Nm somove
23.Nd splice two sockets for zero-copy data transfer
24.Sh SYNOPSIS
25.Ft int
26.Fn sosplice "struct socket *so" "int fd" "off_t max" "struct timeval *tv"
27.Ft int
28.Fn somove "struct socket *so" "int wait"
29.Sh DESCRIPTION
30The function
31.Fn sosplice
32is used to splice together a source and a drain socket.
33The source socket is passed as the
34.Fa so
35argument;
36the file descriptor of the drain is passed in
37.Fa fd .
38If
39.Fa fd
40is negative, an existing splicing gets dissolved.
41If
42.Fa max
43is positive, at most that many bytes will get transferred.
44If
45.Fa tv
46is not NULL, a
47.Xr timeout 9
48is scheduled to dissolve splicing in the case when no data can be
49transferred for the specified period of time.
50Socket splicing can be invoked from userland via the
51.Xr setsockopt 2
52system-call at the
53.Dv SOL_SOCKET
54level with the socket option
55.Dv SO_SPLICE .
56.Pp
57Before connecting both sockets, several checks are executed.
58See the
59.Sx ERRORS
60section for possible failures.
61The connection between both sockets is implemented by setting these
62additional fields in
63.Vt struct socket :
64.Pp
65.Bl -dash -compact -offset indent
66.It
67.Vt struct socket Fa *so_splice
68links from the source to the drain socket.
69.It
70.Vt struct socket Fa *so_spliceback
71links back from the drain to the source socket.
72.It
73.Vt off_t Fa so_splicelen
74counts the number of bytes spliced so far from this socket.
75.It
76.Vt off_t Fa so_splicemax
77specifies the maximum number of bytes to splice from this socket if
78non-zero.
79.It
80.Vt struct timeval Fa so_idletv
81specifies the maximum idle time if non-zero.
82.It
83.Vt struct timeout Fa so_idleto
84provides storage for the kernel timeout if idle time is used.
85.El
86.Pp
87After connecting both sockets,
88.Fn sosplice
89calls
90.Fn somove
91to transfer the mbufs already in the source receive buffer to the
92drain send buffer.
93Finally the socket buffer flag
94.Dv SB_SPLICE
95is set on both socket buffers, to indicate that the protocol layer
96has to call
97.Fn somove
98whenever data or space is available.
99.Pp
100The function
101.Fn somove
102transfers data from the source's receive buffer to the drain's send
103buffer.
104It must be called at
105.Xr splsoftnet 9
106and
107.Fa so
108must be a spliced drain socket.
109It may be necessary to split an mbuf to handle out-of-band data
110inline or when the maximum splice length has been reached.
111If
112.Fa wait
113is
114.Dv M_WAIT ,
115splitting mbufs will always succeed.
116For
117.Dv M_DONTWAIT
118the out-of-band property might get lost or a short splice might
119happen.
120In the latter case, less than the given maximum number of bytes are
121transferred and userland has to cope with this.
122Note that a short splice cannot happen if
123.Fn somove
124was called by
125.Fn sosplice .
126So a second
127.Xr setsockopt 2
128after a short splice pointing to the same maximum will always
129succeed.
130.Pp
131Before transferring data,
132.Fn somove
133checks both sockets for errors and that the drain socket is connected.
134If the drain cannot send anymore, an
135.Er EPIPE
136error is set on the source socket.
137The data length to move is limited by the optional maximum splice
138length and the space in the drain's send socket buffer.
139Up to this amount of data is taken out of the source's receive
140socket buffer.
141.Pp
142For atomic protocols, either one complete packet is taken out, or
143nothing is taken at all if:
144the packet is bigger than the drain's send buffer size, in which
145case the splicing gets aborted with an
146.Er EMSGSIZE
147error;
148the packet does not fit into the drain's current send buffer space,
149in which case it is left in the source's receive buffer for later
150processing;
151or the maximum splice length is located within a packet, in which
152case splicing gets dissolved like a short splice.
153All address or control mbufs associated with the taken packet are
154dropped.
155.Pp
156If the maximum splice length has been reached, an mbuf may get
157split for non-atomic protocols.
158Otherwise an mbuf is either moved completely to the send buffer or
159left in the receive buffer for later processing.
160If SO_OOBINLINE is set, out-of-band data will get moved as such
161although this might not be reliable.
162The data is sent out to the drain socket via the protocol function.
163If that fails and the drain socket cannot send anymore, an
164.Er EPIPE
165error is set on the source socket.
166.Pp
167For packet oriented protocols
168.Fn somove
169iterates over the next packet queue.
170.Pp
171If a maximum splice length was specified and at least this amount
172of data has been received from the drain socket, splicing gets
173dissolved.
174In this case, an
175.Er EFBIG
176error is set on the source socket if the maximum amount of data has
177been transferred.
178Userland can process this error to distinguish the full splice from
179a short splice or to react to the completed maximum splice immediately.
180If an idle timeout was specified and no data has been transferred
181for that period of time, the handler
182.Fn soidle
183dissolves splicing and sets an
184.Er ETIMEDOUT
185error on the source socket.
186.Pp
187The function
188.Fn sounsplice
189is called to dissolve the socket splicing if the source socket
190cannot receive anymore and its receive buffer is empty; or if the
191drain socket cannot send anymore; or if the maximum has been reached;
192or if an error occurred; or if the idle timeout has fired.
193.Pp
194If the socket buffer flag
195.Dv SB_SPLICE
196is set, the functions
197.Fn sorwakeup
198and
199.Fn sowwakeup
200will call
201.Fn somove
202to trigger the transfer when new data or buffer space is available.
203While socket splicing is active, any
204.Xr read 2
205from the source socket will block and the wakeup will not be delivered
206to the file descriptor.
207A read event or a socket error is signaled to userland after
208dissolving.
209.Sh RETURN VALUES
210.Fn sosplice
211returns 0 on success and otherwise the error number.
212.Fn somove
213returns 0 if socket splicing has been finished and 1 if it continues.
214.Sh ERRORS
215.Fn sosplice
216will succeed unless:
217.Bl -tag -width Er
218.It Bq Er EBADF
219The given file descriptor
220.Fa fd
221is not an active descriptor.
222.It Bq Er EBUSY
223The source or the drain socket is already spliced.
224.It Bq Er EINVAL
225The given maximum value
226.Fa max
227is negative.
228.It Bq Er ENOTCONN
229The source socket requires a connection and is neither connected
230nor in the process of connecting to a peer.
231.It Bq Er ENOTCONN
232The drain socket is neither connected nor in the process of connecting
233to a peer.
234.It Bq Er ENOTSOCK
235The given file descriptor
236.Fa fd
237is not a socket.
238.It Bq Er EOPNOTSUPP
239The source or the drain socket is a listen socket.
240.It Bq Er EPROTONOSUPPORT
241The source socket's protocol layer does not have the
242.Dv PR_SPLICE
243flag set.
244Only TCP and UDP socket splicing is supported.
245.It Bq Er EPROTONOSUPPORT
246The drain socket's protocol does not have the same
247.Fa pr_usrreq
248function as the source.
249.It Bq Er EWOULDBLOCK
250The source socket is non-blocking and the receive buffer is already
251locked.
252.El
253.Sh SEE ALSO
254.Xr setsockopt 2 ,
255.Xr options 4 ,
256.Xr timeout 9
257.Sh HISTORY
258Socket splicing for TCP first appeared in
259.Ox 4.9 ;
260support for UDP was added in
261.Ox 5.3 .
262.Sh AUTHORS
263.An -nosplit
264The idea for socket splicing originally came from
265.An Markus Friedl Aq Mt markus@openbsd.org ,
266and
267.An Alexander Bluhm Aq Mt bluhm@openbsd.org
268implemented it.
269.An Mike Belopuhov Aq Mt mikeb@openbsd.org
270added the timeout feature.
271