#
173e7622 |
| 02-May-2024 |
Mina Almasry <almasrymina@google.com> |
Revert "net: mirror skb frag ref/unref helpers"
This reverts commit a580ea994fd37f4105028f5a85c38ff6508a2b25.
This revert is to resolve Dragos's report of page_pool leak here: https://lore.kernel.o
Revert "net: mirror skb frag ref/unref helpers"
This reverts commit a580ea994fd37f4105028f5a85c38ff6508a2b25.
This revert is to resolve Dragos's report of page_pool leak here: https://lore.kernel.org/lkml/20240424165646.1625690-2-dtatulea@nvidia.com/
The reverted patch interacts very badly with commit 2cc3aeb5eccc ("skbuff: Fix a potential race while recycling page_pool packets"). The reverted commit hopes that the pp_recycle + is_pp_page variables do not change between the skb_frag_ref and skb_frag_unref operation. If such a change occurs, the skb_frag_ref/unref will not operate on the same reference type. In the case of Dragos's report, the grabbed ref was a pp ref, but the unref was a page ref, because the pp_recycle setting on the skb was changed.
Attempting to fix this issue on the fly is risky. Lets revert and I hope to reland this with better understanding and testing to ensure we don't regress some edge case while streamlining skb reffing.
Fixes: a580ea994fd3 ("net: mirror skb frag ref/unref helpers") Reported-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Mina Almasry <almasrymina@google.com> Link: https://lore.kernel.org/r/20240502175423.2456544-1-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
d091e579 |
| 27-Apr-2024 |
Felix Fietkau <nbd@nbd.name> |
net: core: reject skb_copy(_expand) for fraglist GSO skbs
SKB_GSO_FRAGLIST skbs must not be linearized, otherwise they become invalid. Return NULL if such an skb is passed to skb_copy or skb_copy_ex
net: core: reject skb_copy(_expand) for fraglist GSO skbs
SKB_GSO_FRAGLIST skbs must not be linearized, otherwise they become invalid. Return NULL if such an skb is passed to skb_copy or skb_copy_expand, in order to prevent a crash on a potential later call to skb_gso_segment.
Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
d480dc76 |
| 29-Apr-2024 |
Eric Dumazet <edumazet@google.com> |
net: move sysctl_skb_defer_max to net_hotdata
sysctl_skb_defer_max is used in TCP fast path, move it to net_hodata.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahe
net: move sysctl_skb_defer_max to net_hotdata
sysctl_skb_defer_max is used in TCP fast path, move it to net_hodata.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240429134025.1233626-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
a86a0661 |
| 29-Apr-2024 |
Eric Dumazet <edumazet@google.com> |
net: move sysctl_max_skb_frags to net_hotdata
sysctl_max_skb_frags is used in TCP and MPTCP fast paths, move it to net_hodata for better cache locality.
Signed-off-by: Eric Dumazet <edumazet@google
net: move sysctl_max_skb_frags to net_hotdata
sysctl_max_skb_frags is used in TCP and MPTCP fast paths, move it to net_hodata for better cache locality.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240429134025.1233626-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
65bada80 |
| 19-Apr-2024 |
Pavel Begunkov <asml.silence@gmail.com> |
net: add callback for setting a ubuf_info to skb
At the moment an skb can only have one ubuf_info associated with it, which might be a performance problem for zerocopy sends in cases like TCP via io
net: add callback for setting a ubuf_info to skb
At the moment an skb can only have one ubuf_info associated with it, which might be a performance problem for zerocopy sends in cases like TCP via io_uring. Add a callback for assigning ubuf_info to skb, this way we will implement smarter assignment later like linking ubuf_info together.
Note, it's an optional callback, which should be compatible with skb_zcopy_set(), that's because the net stack might potentially decide to clone an skb and take another reference to ubuf_info whenever it wishes. Also, a correct implementation should always be able to bind to an skb without prior ubuf_info, otherwise we could end up in a situation when the send would not be able to progress.
Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/all/b7918aadffeb787c84c9e72e34c729dc04f3a45d.1713369317.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
7ab4f16f |
| 19-Apr-2024 |
Pavel Begunkov <asml.silence@gmail.com> |
net: extend ubuf_info callback to ops structure
We'll need to associate additional callbacks with ubuf_info, introduce a structure holding ubuf_info callbacks. Apart from a more smarter io_uring not
net: extend ubuf_info callback to ops structure
We'll need to associate additional callbacks with ubuf_info, introduce a structure holding ubuf_info callbacks. Apart from a more smarter io_uring notification management introduced in next patches, it can be used to generalise msg_zerocopy_put_abort() and also store ->sg_from_iter, which is currently passed in struct msghdr.
Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/all/a62015541de49c0e2a8a0377a1d5d0a5aeb07016.1713369317.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
4d0470b9 |
| 12-Apr-2024 |
Jason Xing <kernelxing@tencent.com> |
net: save some cycles when doing skb_attempt_defer_free()
Normally, we don't face these two exceptions very often meanwhile we have some chance to meet the condition where the current cpu id is the
net: save some cycles when doing skb_attempt_defer_free()
Normally, we don't face these two exceptions very often meanwhile we have some chance to meet the condition where the current cpu id is the same as skb->alloc_cpu.
One simple test that can help us see the frequency of this statement 'cpu == raw_smp_processor_id()': 1. running iperf -s and iperf -c [ip] -P [MAX CPU] 2. using BPF to capture skb_attempt_defer_free()
I can see around 4% chance that happens to satisfy the statement. So moving this statement at the beginning can save some cycles in most cases.
Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
a580ea99 |
| 10-Apr-2024 |
Mina Almasry <almasrymina@google.com> |
net: mirror skb frag ref/unref helpers
Refactor some of the skb frag ref/unref helpers for improved clarity.
Implement napi_pp_get_page() to be the mirror counterpart of napi_pp_put_page().
Implem
net: mirror skb frag ref/unref helpers
Refactor some of the skb frag ref/unref helpers for improved clarity.
Implement napi_pp_get_page() to be the mirror counterpart of napi_pp_put_page().
Implement skb_page_ref() to be the mirror of skb_page_unref().
Improve __skb_frag_ref() to become a mirror counterpart of __skb_frag_unref(). Previously unref could handle pp & non-pp pages, while the ref could only handle non-pp pages. Now both the ref & unref helpers can correctly handle both pp & non-pp pages.
Now that __skb_frag_ref() can handle both pp & non-pp pages, remove skb_pp_frag_ref(), and use __skb_frag_ref() instead. This lets us remove pp specific handling from skb_try_coalesce.
Additionally, since __skb_frag_ref() can now handle both pp & non-pp pages, a latent issue in skb_shift() should now be fixed. Previously this function would do a non-pp ref & pp unref on potential pp frags (fragfrom). After this patch, skb_shift() should correctly do a pp ref/unref on pp frags.
Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240410190505.1225848-3-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
f6d827b1 |
| 10-Apr-2024 |
Mina Almasry <almasrymina@google.com> |
net: move skb ref helpers to new header
Add a new header, linux/skbuff_ref.h, which contains all the skb_*_ref() helpers. Many of the consumers of skbuff.h do not actually use any of the skb ref hel
net: move skb ref helpers to new header
Add a new header, linux/skbuff_ref.h, which contains all the skb_*_ref() helpers. Many of the consumers of skbuff.h do not actually use any of the skb ref helpers, and we can speed up compilation a bit by minimizing this header file.
Additionally in the later patch in the series we add page_pool support to skb_frag_ref(), which requires some page_pool dependencies. We can now add these dependencies to skbuff_ref.h instead of a very ubiquitous skbuff.h
Signed-off-by: Mina Almasry <almasrymina@google.com> Link: https://lore.kernel.org/r/20240410190505.1225848-2-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
d8415a16 |
| 10-Apr-2024 |
Pavel Begunkov <asml.silence@gmail.com> |
net: use SKB_CONSUMED in skb_attempt_defer_free()
skb_attempt_defer_free() is used to free already processed skbs, so pass SKB_CONSUMED as the reason in kfree_skb_napi_cache().
Suggested-by: Jason
net: use SKB_CONSUMED in skb_attempt_defer_free()
skb_attempt_defer_free() is used to free already processed skbs, so pass SKB_CONSUMED as the reason in kfree_skb_napi_cache().
Suggested-by: Jason Xing <kerneljasonxing@gmail.com> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/bcf5dbdda79688b074ab7ae2238535840a6d3fc2.1712711977.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
7cb31c46 |
| 10-Apr-2024 |
Pavel Begunkov <asml.silence@gmail.com> |
net: cache for same cpu skb_attempt_defer_free
Optimise skb_attempt_defer_free() when run by the same CPU the skb was allocated on. Instead of __kfree_skb() -> kmem_cache_free() we can disable softi
net: cache for same cpu skb_attempt_defer_free
Optimise skb_attempt_defer_free() when run by the same CPU the skb was allocated on. Instead of __kfree_skb() -> kmem_cache_free() we can disable softirqs and put the buffer into cpu local caches.
CPU bound TCP ping pong style benchmarking (i.e. netbench) showed a 1% throughput increase (392.2 -> 396.4 Krps). Cross checking with profiles, the total CPU share of skb_attempt_defer_free() dropped by 0.6%. Note, I'd expect the win doubled with rx only benchmarks, as the optimisation is for the receive path, but the test spends >55% of CPU doing writes.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/a887463fb219d973ec5ad275e31194812571f1f5.1712711977.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
f58f3c95 |
| 08-Apr-2024 |
Mina Almasry <almasrymina@google.com> |
net: remove napi_frag_unref
With the changes in the last patches, napi_frag_unref() is now reduandant. Remove it and use skb_page_unref directly.
Signed-off-by: Mina Almasry <almasrymina@google.com
net: remove napi_frag_unref
With the changes in the last patches, napi_frag_unref() is now reduandant. Remove it and use skb_page_unref directly.
Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240408153000.2152844-4-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
4308811b |
| 07-Apr-2024 |
Eric Dumazet <edumazet@google.com> |
net: display more skb fields in skb_dump()
Print these additional fields in skb_dump() to ease debugging.
- mac_len - csum_start (in v2, at Willem suggestion) - csum_offset (in v2, at Willem sugges
net: display more skb fields in skb_dump()
Print these additional fields in skb_dump() to ease debugging.
- mac_len - csum_start (in v2, at Willem suggestion) - csum_offset (in v2, at Willem suggestion) - priority - mark - alloc_cpu - vlan_all - encapsulation - inner_protocol - inner_mac_header - inner_network_header - inner_transport_header
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
4a96a4e8 |
| 29-Mar-2024 |
Alexander Lobakin <aleksander.lobakin@intel.com> |
page_pool: check for PP direct cache locality later
Since we have pool->p.napi (Jakub) and pool->cpuid (Lorenzo) to check whether it's safe to use direct recycling, we can use both globally for each
page_pool: check for PP direct cache locality later
Since we have pool->p.napi (Jakub) and pool->cpuid (Lorenzo) to check whether it's safe to use direct recycling, we can use both globally for each page instead of relying solely on @allow_direct argument. Let's assume that @allow_direct means "I'm sure it's local, don't waste time rechecking this" and when it's false, try the mentioned params to still recycle the page directly. If neither is true, we'll lose some CPU cycles, but then it surely won't be hotpath. On the other hand, paths where it's possible to use direct cache, but not possible to safely set @allow_direct, will benefit from this move. The whole propagation of @napi_safe through a dozen of skb freeing functions can now go away, which saves us some stack space.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20240329165507.3240110-2-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
6e9b0190 |
| 27-Mar-2024 |
Jakub Kicinski <kuba@kernel.org> |
net: remove gfp_mask from napi_alloc_skb()
__napi_alloc_skb() is napi_alloc_skb() with the added flexibility of choosing gfp_mask. This is a NAPI function, so GFP_ATOMIC is implied. The only practic
net: remove gfp_mask from napi_alloc_skb()
__napi_alloc_skb() is napi_alloc_skb() with the added flexibility of choosing gfp_mask. This is a NAPI function, so GFP_ATOMIC is implied. The only practical choice the caller has is whether to set __GFP_NOWARN. But that's a false choice, too, allocation failures in atomic context will happen, and printing warnings in logs, effectively for a packet drop, is both too much and very likely non-actionable.
This leads me to a conclusion that most uses of napi_alloc_skb() are simply misguided, and should use __GFP_NOWARN in the first place. We also have a "standard" way of reporting allocation failures via the queue stat API (qstats::rx-alloc-fail).
The direct motivation for this patch is that one of the drivers used at Meta calls napi_alloc_skb() (so prior to this patch without __GFP_NOWARN), and the resulting OOM warning is the top networking warning in our fleet.
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240327040213.3153864-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
80d2eefc |
| 25-Mar-2024 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
net: Use backlog-NAPI to clean up the defer_list.
The defer_list is a per-CPU list which is used to free skbs outside of the socket lock and on the CPU on which they have been allocated. The list is
net: Use backlog-NAPI to clean up the defer_list.
The defer_list is a per-CPU list which is used to free skbs outside of the socket lock and on the CPU on which they have been allocated. The list is processed during NAPI callbacks so ideally the list is cleaned up. Should the amount of skbs on the list exceed a certain water mark then the softirq is triggered remotely on the target CPU by invoking a remote function call. The raise of the softirqs via a remote function call leads to waking the ksoftirqd on PREEMPT_RT which is undesired. The backlog-NAPI threads already provide the infrastructure which can be utilized to perform the cleanup of the defer_list.
The NAPI state is updated with the input_pkt_queue.lock acquired. It order not to break the state, it is needed to also wake the backlog-NAPI thread with the lock held. This requires to acquire the use the lock in rps_lock_irq*() if the backlog-NAPI threads are used even with RPS disabled.
Move the logic of remotely starting softirqs to clean up the defer_list into kick_defer_list_purge(). Make sure a lock is held in rps_lock_irq*() if backlog-NAPI threads are used. Schedule backlog-NAPI for defer_list cleanup if backlog-NAPI is available.
Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
1cface55 |
| 07-Mar-2024 |
Eric Dumazet <edumazet@google.com> |
net: add skb_data_unref() helper
Similar to skb_unref(), add skb_data_unref() to save an expensive atomic operation (and cache line dirtying) when last reference on shinfo->dataref is released.
I s
net: add skb_data_unref() helper
Similar to skb_unref(), add skb_data_unref() to save an expensive atomic operation (and cache line dirtying) when last reference on shinfo->dataref is released.
I saw this opportunity on hosts with RAW sockets accidentally bound to UDP protocol, forcing an skb_clone() on all received packets.
These RAW sockets had their receive queue full, so all clone packets were immediately dropped.
When UDP recvmsg() consumes later the original skb, skb_release_data() is hitting atomic_sub_return() quite badly, because skb->clone has been set permanently.
Note that this patch helps TCP TX performance, because TCP stack also use (fast) clones.
This means that at least one of the two packets (the main skb or its clone) will no longer have to perform this atomic operation in skb_release_data().
Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240307123446.2302230-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
aa70d2d1 |
| 06-Mar-2024 |
Eric Dumazet <edumazet@google.com> |
net: move skbuff_cache(s) to net_hotdata
skbuff_cache, skbuff_fclone_cache and skb_small_head_cache are used in rx/tx fast paths.
Move them to net_hotdata for better cache locality.
Signed-off-by:
net: move skbuff_cache(s) to net_hotdata
skbuff_cache, skbuff_fclone_cache and skb_small_head_cache are used in rx/tx fast paths.
Move them to net_hotdata for better cache locality.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-11-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
411c5f36 |
| 28-Feb-2024 |
Yunsheng Lin <linyunsheng@huawei.com> |
mm/page_alloc: modify page_frag_alloc_align() to accept align as an argument
napi_alloc_frag_align() and netdev_alloc_frag_align() accept align as an argument, and they are thin wrappers around the
mm/page_alloc: modify page_frag_alloc_align() to accept align as an argument
napi_alloc_frag_align() and netdev_alloc_frag_align() accept align as an argument, and they are thin wrappers around the __napi_alloc_frag_align() and __netdev_alloc_frag_align() APIs doing the alignment checking and align mask conversion, in order to call page_frag_alloc_align() directly. The intention here is to keep the alignment checking and the alignmask conversion in in-line wrapper to avoid those kind of operations during execution time since it can usually be handled during compile time.
We are going to use page_frag_alloc_align() in vhost_net.c, it need the same kind of alignment checking and alignmask conversion, so split up page_frag_alloc_align into an inline wrapper doing the above operation, and add __page_frag_alloc_align() which is passed with the align mask the original function expected as suggested by Alexander.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> CC: Alexander Duyck <alexander.duyck@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
1394c1de |
| 19-Feb-2024 |
Jeremy Kerr <jk@codeconstruct.com.au> |
net: mctp: copy skb ext data when fragmenting
If we're fragmenting on local output, the original packet may contain ext data for the MCTP flows. We'll want this in the resulting fragment skbs too.
net: mctp: copy skb ext data when fragmenting
If we're fragmenting on local output, the original packet may contain ext data for the MCTP flows. We'll want this in the resulting fragment skbs too.
So, do a skb_ext_copy() in the fragmentation path, and implement the MCTP-specific parts of an ext copy operation.
Fixes: 67737c457281 ("mctp: Pass flow data & flow release events to drivers") Reported-by: Jian Zhang <zhangjian.3032@bytedance.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
c6a28acb |
| 17-Feb-2024 |
Lorenzo Bianconi <lorenzo@kernel.org> |
net: fix pointer check in skb_pp_cow_data routine
Properly check page pointer returned by page_pool_dev_alloc routine in skb_pp_cow_data() for non-linear part of the original skb.
Reported-by: Juli
net: fix pointer check in skb_pp_cow_data routine
Properly check page pointer returned by page_pool_dev_alloc routine in skb_pp_cow_data() for non-linear part of the original skb.
Reported-by: Julian Wiedmann <jwiedmann.dev@gmail.com> Closes: https://lore.kernel.org/netdev/cover.1707729884.git.lorenzo@kernel.org/T/#m7d189b0015a7281ed9221903902490c03ed19a7a Fixes: e6d5dbdd20aa ("xdp: add multi-buff support for xdp running in generic mode") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/r/25512af3e09befa9dcb2cf3632bdc45b807cf330.1708167716.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
21d2e673 |
| 14-Feb-2024 |
Mina Almasry <almasrymina@google.com> |
net: add netmem to skb_frag_t
Use struct netmem* instead of page in skb_frag_t. Currently struct netmem* is always a struct page underneath, but the abstraction allows efforts to add support for skb
net: add netmem to skb_frag_t
Use struct netmem* instead of page in skb_frag_t. Currently struct netmem* is always a struct page underneath, but the abstraction allows efforts to add support for skb frags not backed by pages.
There is unfortunately 1 instance where the skb_frag_t is assumed to be a exactly a bio_vec in kcm. For this case, WARN_ON_ONCE and return error before doing a cast.
Add skb[_frag]_fill_netmem_*() and skb_add_rx_frag_netmem() helpers so that the API can be used to create netmem skbs.
Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
56ef27e3 |
| 15-Feb-2024 |
Alexander Lobakin <aleksander.lobakin@intel.com> |
page_pool: disable direct recycling based on pool->cpuid on destroy
Now that direct recycling is performed basing on pool->cpuid when set, memory leaks are possible:
1. A pool is destroyed. 2. Allo
page_pool: disable direct recycling based on pool->cpuid on destroy
Now that direct recycling is performed basing on pool->cpuid when set, memory leaks are possible:
1. A pool is destroyed. 2. Alloc cache is emptied (it's done only once). 3. pool->cpuid is still set. 4. napi_pp_put_page() does direct recycling basing on pool->cpuid. 5. Now alloc cache is not empty, but it won't ever be freed.
In order to avoid that, rewrite pool->cpuid to -1 when unlinking NAPI to make sure no direct recycling will be possible after emptying the cache. This involves a bit of overhead as pool->cpuid now must be accessed via READ_ONCE() to avoid partial reads. Rename page_pool_unlink_napi() -> page_pool_disable_direct_recycling() to reflect what it actually does and unexport it.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20240215113905.96817-1-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
27accb3c |
| 12-Feb-2024 |
Lorenzo Bianconi <lorenzo@kernel.org> |
veth: rely on skb_pp_cow_data utility routine
Rely on skb_pp_cow_data utility routine and remove duplicated code.
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Hoiland-Jorgen
veth: rely on skb_pp_cow_data utility routine
Rely on skb_pp_cow_data utility routine and remove duplicated code.
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/029cc14cce41cb242ee7efdcf32acc81f1ce4e9f.1707729884.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
e6d5dbdd |
| 12-Feb-2024 |
Lorenzo Bianconi <lorenzo@kernel.org> |
xdp: add multi-buff support for xdp running in generic mode
Similar to native xdp, do not always linearize the skb in netif_receive_generic_xdp routine but create a non-linear xdp_buff to be process
xdp: add multi-buff support for xdp running in generic mode
Similar to native xdp, do not always linearize the skb in netif_receive_generic_xdp routine but create a non-linear xdp_buff to be processed by the eBPF program. This allow to add multi-buffer support for xdp running in generic mode.
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/1044d6412b1c3e95b40d34993fd5f37cd2f319fd.1707729884.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|