1d0dcde64SOtto Sabart.. SPDX-License-Identifier: GPL-2.0 2d0dcde64SOtto Sabart 3b83eb68cSOtto Sabart===================== 4b83eb68cSOtto SabartSegmentation Offloads 5b83eb68cSOtto Sabart===================== 6d0dcde64SOtto Sabart 7d0dcde64SOtto Sabart 8d0dcde64SOtto SabartIntroduction 9d0dcde64SOtto Sabart============ 10d0dcde64SOtto Sabart 11d0dcde64SOtto SabartThis document describes a set of techniques in the Linux networking stack 12d0dcde64SOtto Sabartto take advantage of segmentation offload capabilities of various NICs. 13d0dcde64SOtto Sabart 14d0dcde64SOtto SabartThe following technologies are described: 15d0dcde64SOtto Sabart * TCP Segmentation Offload - TSO 16d0dcde64SOtto Sabart * UDP Fragmentation Offload - UFO 17d0dcde64SOtto Sabart * IPIP, SIT, GRE, and UDP Tunnel Offloads 18d0dcde64SOtto Sabart * Generic Segmentation Offload - GSO 19d0dcde64SOtto Sabart * Generic Receive Offload - GRO 20d0dcde64SOtto Sabart * Partial Generic Segmentation Offload - GSO_PARTIAL 21*ba3c4385SWeitao Hou * SCTP acceleration with GSO - GSO_BY_FRAGS 22d0dcde64SOtto Sabart 23d0dcde64SOtto Sabart 24d0dcde64SOtto SabartTCP Segmentation Offload 25d0dcde64SOtto Sabart======================== 26d0dcde64SOtto Sabart 27d0dcde64SOtto SabartTCP segmentation allows a device to segment a single frame into multiple 28d0dcde64SOtto Sabartframes with a data payload size specified in skb_shinfo()->gso_size. 29d0dcde64SOtto SabartWhen TCP segmentation requested the bit for either SKB_GSO_TCPV4 or 30d0dcde64SOtto SabartSKB_GSO_TCPV6 should be set in skb_shinfo()->gso_type and 31d0dcde64SOtto Sabartskb_shinfo()->gso_size should be set to a non-zero value. 32d0dcde64SOtto Sabart 33d0dcde64SOtto SabartTCP segmentation is dependent on support for the use of partial checksum 34d0dcde64SOtto Sabartoffload. For this reason TSO is normally disabled if the Tx checksum 35d0dcde64SOtto Sabartoffload for a given device is disabled. 36d0dcde64SOtto Sabart 37d0dcde64SOtto SabartIn order to support TCP segmentation offload it is necessary to populate 38d0dcde64SOtto Sabartthe network and transport header offsets of the skbuff so that the device 39d0dcde64SOtto Sabartdrivers will be able determine the offsets of the IP or IPv6 header and the 40d0dcde64SOtto SabartTCP header. In addition as CHECKSUM_PARTIAL is required csum_start should 41d0dcde64SOtto Sabartalso point to the TCP header of the packet. 42d0dcde64SOtto Sabart 43d0dcde64SOtto SabartFor IPv4 segmentation we support one of two types in terms of the IP ID. 44d0dcde64SOtto SabartThe default behavior is to increment the IP ID with every segment. If the 45d0dcde64SOtto SabartGSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP 46d0dcde64SOtto SabartID and all segments will use the same IP ID. If a device has 47d0dcde64SOtto SabartNETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO 48d0dcde64SOtto Sabartand we will either increment the IP ID for all frames, or leave it at a 49d0dcde64SOtto Sabartstatic value based on driver preference. 50d0dcde64SOtto Sabart 51d0dcde64SOtto Sabart 52d0dcde64SOtto SabartUDP Fragmentation Offload 53d0dcde64SOtto Sabart========================= 54d0dcde64SOtto Sabart 55d0dcde64SOtto SabartUDP fragmentation offload allows a device to fragment an oversized UDP 56d0dcde64SOtto Sabartdatagram into multiple IPv4 fragments. Many of the requirements for UDP 57d0dcde64SOtto Sabartfragmentation offload are the same as TSO. However the IPv4 ID for 58d0dcde64SOtto Sabartfragments should not increment as a single IPv4 datagram is fragmented. 59d0dcde64SOtto Sabart 60d0dcde64SOtto SabartUFO is deprecated: modern kernels will no longer generate UFO skbs, but can 61d0dcde64SOtto Sabartstill receive them from tuntap and similar devices. Offload of UDP-based 62d0dcde64SOtto Sabarttunnel protocols is still supported. 63d0dcde64SOtto Sabart 64d0dcde64SOtto Sabart 65d0dcde64SOtto SabartIPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads 66d0dcde64SOtto Sabart======================================================== 67d0dcde64SOtto Sabart 68d0dcde64SOtto SabartIn addition to the offloads described above it is possible for a frame to 69d0dcde64SOtto Sabartcontain additional headers such as an outer tunnel. In order to account 70d0dcde64SOtto Sabartfor such instances an additional set of segmentation offload types were 71d0dcde64SOtto Sabartintroduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, and 72d0dcde64SOtto SabartSKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify 73d0dcde64SOtto Sabartcases where there are more than just 1 set of headers. For example in the 74d0dcde64SOtto Sabartcase of IPIP and SIT we should have the network and transport headers moved 75d0dcde64SOtto Sabartfrom the standard list of headers to "inner" header offsets. 76d0dcde64SOtto Sabart 77d0dcde64SOtto SabartCurrently only two levels of headers are supported. The convention is to 78d0dcde64SOtto Sabartrefer to the tunnel headers as the outer headers, while the encapsulated 79d0dcde64SOtto Sabartdata is normally referred to as the inner headers. Below is the list of 80d0dcde64SOtto Sabartcalls to access the given headers: 81d0dcde64SOtto Sabart 82d0dcde64SOtto SabartIPIP/SIT Tunnel:: 83d0dcde64SOtto Sabart 84d0dcde64SOtto Sabart Outer Inner 85d0dcde64SOtto Sabart MAC skb_mac_header 86d0dcde64SOtto Sabart Network skb_network_header skb_inner_network_header 87d0dcde64SOtto Sabart Transport skb_transport_header 88d0dcde64SOtto Sabart 89d0dcde64SOtto SabartUDP/GRE Tunnel:: 90d0dcde64SOtto Sabart 91d0dcde64SOtto Sabart Outer Inner 92d0dcde64SOtto Sabart MAC skb_mac_header skb_inner_mac_header 93d0dcde64SOtto Sabart Network skb_network_header skb_inner_network_header 94d0dcde64SOtto Sabart Transport skb_transport_header skb_inner_transport_header 95d0dcde64SOtto Sabart 96d0dcde64SOtto SabartIn addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and 97d0dcde64SOtto SabartSKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the 98d0dcde64SOtto Sabartfact that the outer header also requests to have a non-zero checksum 99d0dcde64SOtto Sabartincluded in the outer header. 100d0dcde64SOtto Sabart 101d0dcde64SOtto SabartFinally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnel 102d0dcde64SOtto Sabartheader has requested a remote checksum offload. In this case the inner 103d0dcde64SOtto Sabartheaders will be left with a partial checksum and only the outer header 104d0dcde64SOtto Sabartchecksum will be computed. 105d0dcde64SOtto Sabart 106d0dcde64SOtto Sabart 107d0dcde64SOtto SabartGeneric Segmentation Offload 108d0dcde64SOtto Sabart============================ 109d0dcde64SOtto Sabart 110d0dcde64SOtto SabartGeneric segmentation offload is a pure software offload that is meant to 111d0dcde64SOtto Sabartdeal with cases where device drivers cannot perform the offloads described 112d0dcde64SOtto Sabartabove. What occurs in GSO is that a given skbuff will have its data broken 113d0dcde64SOtto Sabartout over multiple skbuffs that have been resized to match the MSS provided 114d0dcde64SOtto Sabartvia skb_shinfo()->gso_size. 115d0dcde64SOtto Sabart 116d0dcde64SOtto SabartBefore enabling any hardware segmentation offload a corresponding software 117d0dcde64SOtto Sabartoffload is required in GSO. Otherwise it becomes possible for a frame to 118d0dcde64SOtto Sabartbe re-routed between devices and end up being unable to be transmitted. 119d0dcde64SOtto Sabart 120d0dcde64SOtto Sabart 121d0dcde64SOtto SabartGeneric Receive Offload 122d0dcde64SOtto Sabart======================= 123d0dcde64SOtto Sabart 124d0dcde64SOtto SabartGeneric receive offload is the complement to GSO. Ideally any frame 125d0dcde64SOtto Sabartassembled by GRO should be segmented to create an identical sequence of 126d0dcde64SOtto Sabartframes using GSO, and any sequence of frames segmented by GSO should be 127d0dcde64SOtto Sabartable to be reassembled back to the original by GRO. The only exception to 128d0dcde64SOtto Sabartthis is IPv4 ID in the case that the DF bit is set for a given IP header. 129d0dcde64SOtto SabartIf the value of the IPv4 ID is not sequentially incrementing it will be 130d0dcde64SOtto Sabartaltered so that it is when a frame assembled via GRO is segmented via GSO. 131d0dcde64SOtto Sabart 132d0dcde64SOtto Sabart 133d0dcde64SOtto SabartPartial Generic Segmentation Offload 134d0dcde64SOtto Sabart==================================== 135d0dcde64SOtto Sabart 136d0dcde64SOtto SabartPartial generic segmentation offload is a hybrid between TSO and GSO. What 137d0dcde64SOtto Sabartit effectively does is take advantage of certain traits of TCP and tunnels 138d0dcde64SOtto Sabartso that instead of having to rewrite the packet headers for each segment 139d0dcde64SOtto Sabartonly the inner-most transport header and possibly the outer-most network 140d0dcde64SOtto Sabartheader need to be updated. This allows devices that do not support tunnel 141d0dcde64SOtto Sabartoffloads or tunnel offloads with checksum to still make use of segmentation. 142d0dcde64SOtto Sabart 143d0dcde64SOtto SabartWith the partial offload what occurs is that all headers excluding the 144d0dcde64SOtto Sabartinner transport header are updated such that they will contain the correct 145d0dcde64SOtto Sabartvalues for if the header was simply duplicated. The one exception to this 146d0dcde64SOtto Sabartis the outer IPv4 ID field. It is up to the device drivers to guarantee 147d0dcde64SOtto Sabartthat the IPv4 ID field is incremented in the case that a given header does 148d0dcde64SOtto Sabartnot have the DF bit set. 149d0dcde64SOtto Sabart 150d0dcde64SOtto Sabart 151*ba3c4385SWeitao HouSCTP acceleration with GSO 152d0dcde64SOtto Sabart=========================== 153d0dcde64SOtto Sabart 154d0dcde64SOtto SabartSCTP - despite the lack of hardware support - can still take advantage of 155d0dcde64SOtto SabartGSO to pass one large packet through the network stack, rather than 156d0dcde64SOtto Sabartmultiple small packets. 157d0dcde64SOtto Sabart 158d0dcde64SOtto SabartThis requires a different approach to other offloads, as SCTP packets 159d0dcde64SOtto Sabartcannot be just segmented to (P)MTU. Rather, the chunks must be contained in 160d0dcde64SOtto SabartIP segments, padding respected. So unlike regular GSO, SCTP can't just 161d0dcde64SOtto Sabartgenerate a big skb, set gso_size to the fragmentation point and deliver it 162d0dcde64SOtto Sabartto IP layer. 163d0dcde64SOtto Sabart 164d0dcde64SOtto SabartInstead, the SCTP protocol layer builds an skb with the segments correctly 165d0dcde64SOtto Sabartpadded and stored as chained skbs, and skb_segment() splits based on those. 166d0dcde64SOtto SabartTo signal this, gso_size is set to the special value GSO_BY_FRAGS. 167d0dcde64SOtto Sabart 168d0dcde64SOtto SabartTherefore, any code in the core networking stack must be aware of the 169d0dcde64SOtto Sabartpossibility that gso_size will be GSO_BY_FRAGS and handle that case 170d0dcde64SOtto Sabartappropriately. 171d0dcde64SOtto Sabart 172d0dcde64SOtto SabartThere are some helpers to make this easier: 173d0dcde64SOtto Sabart 174d0dcde64SOtto Sabart- skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if 175d0dcde64SOtto Sabart an skb is an SCTP GSO skb. 176d0dcde64SOtto Sabart 177d0dcde64SOtto Sabart- For size checks, the skb_gso_validate_*_len family of helpers correctly 178d0dcde64SOtto Sabart considers GSO_BY_FRAGS. 179d0dcde64SOtto Sabart 180d0dcde64SOtto Sabart- For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size 181d0dcde64SOtto Sabart will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs. 182d0dcde64SOtto Sabart 183d0dcde64SOtto SabartThis also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits 184d0dcde64SOtto Sabartset. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE. 185