1d0dcde64SOtto Sabart.. SPDX-License-Identifier: GPL-2.0
2d0dcde64SOtto Sabart
3b83eb68cSOtto Sabart=====================
4b83eb68cSOtto SabartSegmentation Offloads
5b83eb68cSOtto Sabart=====================
6d0dcde64SOtto Sabart
7d0dcde64SOtto Sabart
8d0dcde64SOtto SabartIntroduction
9d0dcde64SOtto Sabart============
10d0dcde64SOtto Sabart
11d0dcde64SOtto SabartThis document describes a set of techniques in the Linux networking stack
12d0dcde64SOtto Sabartto take advantage of segmentation offload capabilities of various NICs.
13d0dcde64SOtto Sabart
14d0dcde64SOtto SabartThe following technologies are described:
15d0dcde64SOtto Sabart * TCP Segmentation Offload - TSO
16d0dcde64SOtto Sabart * UDP Fragmentation Offload - UFO
17d0dcde64SOtto Sabart * IPIP, SIT, GRE, and UDP Tunnel Offloads
18d0dcde64SOtto Sabart * Generic Segmentation Offload - GSO
19d0dcde64SOtto Sabart * Generic Receive Offload - GRO
20d0dcde64SOtto Sabart * Partial Generic Segmentation Offload - GSO_PARTIAL
21*ba3c4385SWeitao Hou * SCTP acceleration with GSO - GSO_BY_FRAGS
22d0dcde64SOtto Sabart
23d0dcde64SOtto Sabart
24d0dcde64SOtto SabartTCP Segmentation Offload
25d0dcde64SOtto Sabart========================
26d0dcde64SOtto Sabart
27d0dcde64SOtto SabartTCP segmentation allows a device to segment a single frame into multiple
28d0dcde64SOtto Sabartframes with a data payload size specified in skb_shinfo()->gso_size.
29d0dcde64SOtto SabartWhen TCP segmentation requested the bit for either SKB_GSO_TCPV4 or
30d0dcde64SOtto SabartSKB_GSO_TCPV6 should be set in skb_shinfo()->gso_type and
31d0dcde64SOtto Sabartskb_shinfo()->gso_size should be set to a non-zero value.
32d0dcde64SOtto Sabart
33d0dcde64SOtto SabartTCP segmentation is dependent on support for the use of partial checksum
34d0dcde64SOtto Sabartoffload.  For this reason TSO is normally disabled if the Tx checksum
35d0dcde64SOtto Sabartoffload for a given device is disabled.
36d0dcde64SOtto Sabart
37d0dcde64SOtto SabartIn order to support TCP segmentation offload it is necessary to populate
38d0dcde64SOtto Sabartthe network and transport header offsets of the skbuff so that the device
39d0dcde64SOtto Sabartdrivers will be able determine the offsets of the IP or IPv6 header and the
40d0dcde64SOtto SabartTCP header.  In addition as CHECKSUM_PARTIAL is required csum_start should
41d0dcde64SOtto Sabartalso point to the TCP header of the packet.
42d0dcde64SOtto Sabart
43d0dcde64SOtto SabartFor IPv4 segmentation we support one of two types in terms of the IP ID.
44d0dcde64SOtto SabartThe default behavior is to increment the IP ID with every segment.  If the
45d0dcde64SOtto SabartGSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP
46d0dcde64SOtto SabartID and all segments will use the same IP ID.  If a device has
47d0dcde64SOtto SabartNETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO
48d0dcde64SOtto Sabartand we will either increment the IP ID for all frames, or leave it at a
49d0dcde64SOtto Sabartstatic value based on driver preference.
50d0dcde64SOtto Sabart
51d0dcde64SOtto Sabart
52d0dcde64SOtto SabartUDP Fragmentation Offload
53d0dcde64SOtto Sabart=========================
54d0dcde64SOtto Sabart
55d0dcde64SOtto SabartUDP fragmentation offload allows a device to fragment an oversized UDP
56d0dcde64SOtto Sabartdatagram into multiple IPv4 fragments.  Many of the requirements for UDP
57d0dcde64SOtto Sabartfragmentation offload are the same as TSO.  However the IPv4 ID for
58d0dcde64SOtto Sabartfragments should not increment as a single IPv4 datagram is fragmented.
59d0dcde64SOtto Sabart
60d0dcde64SOtto SabartUFO is deprecated: modern kernels will no longer generate UFO skbs, but can
61d0dcde64SOtto Sabartstill receive them from tuntap and similar devices. Offload of UDP-based
62d0dcde64SOtto Sabarttunnel protocols is still supported.
63d0dcde64SOtto Sabart
64d0dcde64SOtto Sabart
65d0dcde64SOtto SabartIPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads
66d0dcde64SOtto Sabart========================================================
67d0dcde64SOtto Sabart
68d0dcde64SOtto SabartIn addition to the offloads described above it is possible for a frame to
69d0dcde64SOtto Sabartcontain additional headers such as an outer tunnel.  In order to account
70d0dcde64SOtto Sabartfor such instances an additional set of segmentation offload types were
71d0dcde64SOtto Sabartintroduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, and
72d0dcde64SOtto SabartSKB_GSO_UDP_TUNNEL.  These extra segmentation types are used to identify
73d0dcde64SOtto Sabartcases where there are more than just 1 set of headers.  For example in the
74d0dcde64SOtto Sabartcase of IPIP and SIT we should have the network and transport headers moved
75d0dcde64SOtto Sabartfrom the standard list of headers to "inner" header offsets.
76d0dcde64SOtto Sabart
77d0dcde64SOtto SabartCurrently only two levels of headers are supported.  The convention is to
78d0dcde64SOtto Sabartrefer to the tunnel headers as the outer headers, while the encapsulated
79d0dcde64SOtto Sabartdata is normally referred to as the inner headers.  Below is the list of
80d0dcde64SOtto Sabartcalls to access the given headers:
81d0dcde64SOtto Sabart
82d0dcde64SOtto SabartIPIP/SIT Tunnel::
83d0dcde64SOtto Sabart
84d0dcde64SOtto Sabart             Outer                  Inner
85d0dcde64SOtto Sabart  MAC        skb_mac_header
86d0dcde64SOtto Sabart  Network    skb_network_header     skb_inner_network_header
87d0dcde64SOtto Sabart  Transport  skb_transport_header
88d0dcde64SOtto Sabart
89d0dcde64SOtto SabartUDP/GRE Tunnel::
90d0dcde64SOtto Sabart
91d0dcde64SOtto Sabart             Outer                  Inner
92d0dcde64SOtto Sabart  MAC        skb_mac_header         skb_inner_mac_header
93d0dcde64SOtto Sabart  Network    skb_network_header     skb_inner_network_header
94d0dcde64SOtto Sabart  Transport  skb_transport_header   skb_inner_transport_header
95d0dcde64SOtto Sabart
96d0dcde64SOtto SabartIn addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and
97d0dcde64SOtto SabartSKB_GSO_UDP_TUNNEL_CSUM.  These two additional tunnel types reflect the
98d0dcde64SOtto Sabartfact that the outer header also requests to have a non-zero checksum
99d0dcde64SOtto Sabartincluded in the outer header.
100d0dcde64SOtto Sabart
101d0dcde64SOtto SabartFinally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnel
102d0dcde64SOtto Sabartheader has requested a remote checksum offload.  In this case the inner
103d0dcde64SOtto Sabartheaders will be left with a partial checksum and only the outer header
104d0dcde64SOtto Sabartchecksum will be computed.
105d0dcde64SOtto Sabart
106d0dcde64SOtto Sabart
107d0dcde64SOtto SabartGeneric Segmentation Offload
108d0dcde64SOtto Sabart============================
109d0dcde64SOtto Sabart
110d0dcde64SOtto SabartGeneric segmentation offload is a pure software offload that is meant to
111d0dcde64SOtto Sabartdeal with cases where device drivers cannot perform the offloads described
112d0dcde64SOtto Sabartabove.  What occurs in GSO is that a given skbuff will have its data broken
113d0dcde64SOtto Sabartout over multiple skbuffs that have been resized to match the MSS provided
114d0dcde64SOtto Sabartvia skb_shinfo()->gso_size.
115d0dcde64SOtto Sabart
116d0dcde64SOtto SabartBefore enabling any hardware segmentation offload a corresponding software
117d0dcde64SOtto Sabartoffload is required in GSO.  Otherwise it becomes possible for a frame to
118d0dcde64SOtto Sabartbe re-routed between devices and end up being unable to be transmitted.
119d0dcde64SOtto Sabart
120d0dcde64SOtto Sabart
121d0dcde64SOtto SabartGeneric Receive Offload
122d0dcde64SOtto Sabart=======================
123d0dcde64SOtto Sabart
124d0dcde64SOtto SabartGeneric receive offload is the complement to GSO.  Ideally any frame
125d0dcde64SOtto Sabartassembled by GRO should be segmented to create an identical sequence of
126d0dcde64SOtto Sabartframes using GSO, and any sequence of frames segmented by GSO should be
127d0dcde64SOtto Sabartable to be reassembled back to the original by GRO.  The only exception to
128d0dcde64SOtto Sabartthis is IPv4 ID in the case that the DF bit is set for a given IP header.
129d0dcde64SOtto SabartIf the value of the IPv4 ID is not sequentially incrementing it will be
130d0dcde64SOtto Sabartaltered so that it is when a frame assembled via GRO is segmented via GSO.
131d0dcde64SOtto Sabart
132d0dcde64SOtto Sabart
133d0dcde64SOtto SabartPartial Generic Segmentation Offload
134d0dcde64SOtto Sabart====================================
135d0dcde64SOtto Sabart
136d0dcde64SOtto SabartPartial generic segmentation offload is a hybrid between TSO and GSO.  What
137d0dcde64SOtto Sabartit effectively does is take advantage of certain traits of TCP and tunnels
138d0dcde64SOtto Sabartso that instead of having to rewrite the packet headers for each segment
139d0dcde64SOtto Sabartonly the inner-most transport header and possibly the outer-most network
140d0dcde64SOtto Sabartheader need to be updated.  This allows devices that do not support tunnel
141d0dcde64SOtto Sabartoffloads or tunnel offloads with checksum to still make use of segmentation.
142d0dcde64SOtto Sabart
143d0dcde64SOtto SabartWith the partial offload what occurs is that all headers excluding the
144d0dcde64SOtto Sabartinner transport header are updated such that they will contain the correct
145d0dcde64SOtto Sabartvalues for if the header was simply duplicated.  The one exception to this
146d0dcde64SOtto Sabartis the outer IPv4 ID field.  It is up to the device drivers to guarantee
147d0dcde64SOtto Sabartthat the IPv4 ID field is incremented in the case that a given header does
148d0dcde64SOtto Sabartnot have the DF bit set.
149d0dcde64SOtto Sabart
150d0dcde64SOtto Sabart
151*ba3c4385SWeitao HouSCTP acceleration with GSO
152d0dcde64SOtto Sabart===========================
153d0dcde64SOtto Sabart
154d0dcde64SOtto SabartSCTP - despite the lack of hardware support - can still take advantage of
155d0dcde64SOtto SabartGSO to pass one large packet through the network stack, rather than
156d0dcde64SOtto Sabartmultiple small packets.
157d0dcde64SOtto Sabart
158d0dcde64SOtto SabartThis requires a different approach to other offloads, as SCTP packets
159d0dcde64SOtto Sabartcannot be just segmented to (P)MTU. Rather, the chunks must be contained in
160d0dcde64SOtto SabartIP segments, padding respected. So unlike regular GSO, SCTP can't just
161d0dcde64SOtto Sabartgenerate a big skb, set gso_size to the fragmentation point and deliver it
162d0dcde64SOtto Sabartto IP layer.
163d0dcde64SOtto Sabart
164d0dcde64SOtto SabartInstead, the SCTP protocol layer builds an skb with the segments correctly
165d0dcde64SOtto Sabartpadded and stored as chained skbs, and skb_segment() splits based on those.
166d0dcde64SOtto SabartTo signal this, gso_size is set to the special value GSO_BY_FRAGS.
167d0dcde64SOtto Sabart
168d0dcde64SOtto SabartTherefore, any code in the core networking stack must be aware of the
169d0dcde64SOtto Sabartpossibility that gso_size will be GSO_BY_FRAGS and handle that case
170d0dcde64SOtto Sabartappropriately.
171d0dcde64SOtto Sabart
172d0dcde64SOtto SabartThere are some helpers to make this easier:
173d0dcde64SOtto Sabart
174d0dcde64SOtto Sabart- skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if
175d0dcde64SOtto Sabart  an skb is an SCTP GSO skb.
176d0dcde64SOtto Sabart
177d0dcde64SOtto Sabart- For size checks, the skb_gso_validate_*_len family of helpers correctly
178d0dcde64SOtto Sabart  considers GSO_BY_FRAGS.
179d0dcde64SOtto Sabart
180d0dcde64SOtto Sabart- For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size
181d0dcde64SOtto Sabart  will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs.
182d0dcde64SOtto Sabart
183d0dcde64SOtto SabartThis also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits
184d0dcde64SOtto Sabartset. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE.
185