#
e8205119 |
| 07-Apr-2024 |
Al Viro <viro@zeniv.linux.org.uk> |
new helper: copy_to_iter_full()
... and convert copy_linear_skb() to using that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
cc97777c |
| 05-Aug-2023 |
Yue Haibing <yuehaibing@huawei.com> |
udp/udplite: Remove unused function declarations udp{,lite}_get_port()
Commit 6ba5a3c52da0 ("[UDP]: Make full use of proto.h.udp_hash innovation.") removed these implementations but leave declaratio
udp/udplite: Remove unused function declarations udp{,lite}_get_port()
Commit 6ba5a3c52da0 ("[UDP]: Make full use of proto.h.udp_hash innovation.") removed these implementations but leave declarations.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
9e63a99c |
| 01-Aug-2023 |
Yue Haibing <yuehaibing@huawei.com> |
udp: Remove unused function declaration udp_bpf_get_proto()
commit 8a59f9d1e3d4 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()") left behind this.
Signed-off-by: Yue Haibing <yuehaibing@huaw
udp: Remove unused function declaration udp_bpf_get_proto()
commit 8a59f9d1e3d4 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()") left behind this.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230801133902.3660-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
e1d001fa |
| 09-Jun-2023 |
Breno Leitao <leitao@debian.org> |
net: ioctl: Use kernel memory on protocol ioctl callbacks
Most of the ioctls to net protocols operates directly on userspace argument (arg). Usually doing get_user()/put_user() directly in the ioctl
net: ioctl: Use kernel memory on protocol ioctl callbacks
Most of the ioctls to net protocols operates directly on userspace argument (arg). Usually doing get_user()/put_user() directly in the ioctl callback. This is not flexible, because it is hard to reuse these functions without passing userspace buffers.
Change the "struct proto" ioctls to avoid touching userspace memory and operate on kernel buffers, i.e., all protocol's ioctl callbacks is adapted to operate on a kernel memory other than on userspace (so, no more {put,get}_user() and friends being called in the ioctl callback).
This changes the "struct proto" ioctl format in the following way:
int (*ioctl)(struct sock *sk, int cmd, - unsigned long arg); + int *karg);
(Important to say that this patch does not touch the "struct proto_ops" protocols)
So, the "karg" argument, which is passed to the ioctl callback, is a pointer allocated to kernel space memory (inside a function wrapper). This buffer (karg) may contain input argument (copied from userspace in a prep function) and it might return a value/buffer, which is copied back to userspace if necessary. There is not one-size-fits-all format (that is I am using 'may' above), but basically, there are three type of ioctls:
1) Do not read from userspace, returns a result to userspace 2) Read an input parameter from userspace, and does not return anything to userspace 3) Read an input from userspace, and return a buffer to userspace.
The default case (1) (where no input parameter is given, and an "int" is returned to userspace) encompasses more than 90% of the cases, but there are two other exceptions. Here is a list of exceptions:
* Protocol RAW: * cmd = SIOCGETVIFCNT: * input and output = struct sioc_vif_req * cmd = SIOCGETSGCNT * input and output = struct sioc_sg_req * Explanation: for the SIOCGETVIFCNT case, userspace passes the input argument, which is struct sioc_vif_req. Then the callback populates the struct, which is copied back to userspace.
* Protocol RAW6: * cmd = SIOCGETMIFCNT_IN6 * input and output = struct sioc_mif_req6 * cmd = SIOCGETSGCNT_IN6 * input and output = struct sioc_sg_req6
* Protocol PHONET: * cmd == SIOCPNADDRESOURCE | SIOCPNDELRESOURCE * input int (4 bytes) * Nothing is copied back to userspace.
For the exception cases, functions sock_sk_ioctl_inout() will copy the userspace input, and copy it back to kernel space.
The wrapper that prepare the buffer and put the buffer back to user is sk_ioctl(), so, instead of calling sk->sk_prot->ioctl(), the callee now calls sk_ioctl(), which will handle all cases.
Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230609152800.830401-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
d457a0e3 |
| 08-Jun-2023 |
Eric Dumazet <edumazet@google.com> |
net: move gso declarations and functions to their own files
Move declarations into include/net/gso.h and code into net/core/gso.c
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stanislav Fom
net: move gso declarations and functions to their own files
Move declarations into include/net/gso.h and code into net/core/gso.c
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stanislav Fomichev <sdf@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230608191738.3947077-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
1d7e4538 |
| 07-Jun-2023 |
David Howells <dhowells@redhat.com> |
ipv4, ipv6: Use splice_eof() to flush
Allow splice to undo the effects of MSG_MORE after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splic
ipv4, ipv6: Use splice_eof() to flush
Allow splice to undo the effects of MSG_MORE after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splice had called sendmsg() with MSG_MORE set when the user didn't set MSG_MORE.
For UDP, a pending packet will not be emitted if the socket is closed before it is flushed; with this change, it be flushed by ->splice_eof().
For TCP, it's not clear that MSG_MORE is actually effective.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Kuniyuki Iwashima <kuniyu@amazon.com> cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> cc: David Ahern <dsahern@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
e4fe1bf1 |
| 19-May-2023 |
Aditi Ghag <aditi.ghag@isovalent.com> |
udp: seq_file: Remove bpf_seq_afinfo from udp_iter_state
This is a preparatory commit to remove the field. The field was previously shared between proc fs and BPF UDP socket iterators. As the follow
udp: seq_file: Remove bpf_seq_afinfo from udp_iter_state
This is a preparatory commit to remove the field. The field was previously shared between proc fs and BPF UDP socket iterators. As the follow-up commits will decouple the implementation for the iterators, remove the field. As for BPF socket iterator, filtering of sockets is exepected to be done in BPF programs.
Suggested-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com> Link: https://lore.kernel.org/r/20230519225157.760788-5-aditi.ghag@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
show more ...
|
#
8a3854c7 |
| 20-Oct-2022 |
Paolo Abeni <pabeni@redhat.com> |
udp: track the forward memory release threshold in an hot cacheline
When the receiver process and the BH runs on different cores, udp_rmem_release() experience a cache miss while accessing sk_rcvbuf
udp: track the forward memory release threshold in an hot cacheline
When the receiver process and the BH runs on different cores, udp_rmem_release() experience a cache miss while accessing sk_rcvbuf, as the latter shares the same cacheline with sk_forward_alloc, written by the BH.
With this patch, UDP tracks the rcvbuf value and its update via custom SOL_SOCKET socket options, and copies the forward memory threshold value used by udp_rmem_release() in a different cacheline, already accessed by the above function and uncontended.
Since the UDP socket init operation grown a bit, factor out the common code between v4 and v6 in a shared helper.
Overall the above give a 10% peek throughput increase under UDP flood.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
d38afeec |
| 06-Oct-2022 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().
Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were able to clean them up by calling inet6_destroy_sock() during th
tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().
Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were able to clean them up by calling inet6_destroy_sock() during the IPv6 -> IPv4 conversion by IPV6_ADDRFORM. However, commit 03485f2adcde ("udpv6: Add lockless sendmsg() support") added a lockless memory allocation path, which could cause a memory leak:
setsockopt(IPV6_ADDRFORM) sendmsg() +-----------------------+ +-------+ - do_ipv6_setsockopt(sk, ...) - udpv6_sendmsg(sk, ...) - sockopt_lock_sock(sk) ^._ called via udpv6_prot - lock_sock(sk) before WRITE_ONCE() - WRITE_ONCE(sk->sk_prot, &tcp_prot) - inet6_destroy_sock() - if (!corkreq) - sockopt_release_sock(sk) - ip6_make_skb(sk, ...) - release_sock(sk) ^._ lockless fast path for the non-corking case
- __ip6_append_data(sk, ...) - ipv6_local_rxpmtu(sk, ...) - xchg(&np->rxpmtu, skb) ^._ rxpmtu is never freed.
- goto out_no_dst;
- lock_sock(sk)
For now, rxpmtu is only the case, but not to miss the future change and a similar bug fixed in commit e27326009a3d ("net: ping6: Fix memleak in ipv6_renew_options()."), let's set a new function to IPv6 sk->sk_destruct() and call inet6_cleanup_sock() there. Since the conversion does not change sk->sk_destruct(), we can guarantee that we can clean up IPv6 resources finally.
We can now remove all inet6_destroy_sock() calls from IPv6 protocol specific ->destroy() functions, but such changes are invasive to backport. So they can be posted as a follow-up later for net-next.
Fixes: 03485f2adcde ("udpv6: Add lockless sendmsg() support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
3d72bb41 |
| 18-Jul-2022 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
udp: Fix a data-race around sysctl_udp_l3mdev_accept.
While reading sysctl_udp_l3mdev_accept, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 63a6fff353d0 ("n
udp: Fix a data-race around sysctl_udp_l3mdev_accept.
While reading sysctl_udp_l3mdev_accept, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 63a6fff353d0 ("net: Avoid receiving packets with an l3mdev on unbound UDP sockets") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
11052589 |
| 13-Jul-2022 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
tcp/udp: Make early_demux back namespacified.
Commit e21145a9871a ("ipv4: namespacify ip_early_demux sysctl knob") made it possible to enable/disable early_demux on a per-netns basis. Then, we intr
tcp/udp: Make early_demux back namespacified.
Commit e21145a9871a ("ipv4: namespacify ip_early_demux sysctl knob") made it possible to enable/disable early_demux on a per-netns basis. Then, we introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for TCP/UDP in commit dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp"). However, the .proc_handler() was wrong and actually disabled us from changing the behaviour in each netns.
We can execute early_demux if net.ipv4.ip_early_demux is on and each proto .early_demux() handler is not NULL. When we toggle (tcp|udp)_early_demux, the change itself is saved in each netns variable, but the .early_demux() handler is a global variable, so the handler is switched based on the init_net's sysctl variable. Thus, netns (tcp|udp)_early_demux knobs have nothing to do with the logic. Whether we CAN execute proto .early_demux() is always decided by init_net's sysctl knob, and whether we DO it or not is by each netns ip_early_demux knob.
This patch namespacifies (tcp|udp)_early_demux again. For now, the users of the .early_demux() handler are TCP and UDP only, and they are called directly to avoid retpoline. So, we can remove the .early_demux() handler from inet6?_protos and need not dereference them in ip6?_rcv_finish_core(). If another proto needs .early_demux(), we can restore it at that time.
Fixes: dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
965b57b4 |
| 15-Jun-2022 |
Cong Wang <cong.wang@bytedance.com> |
net: Introduce a new proto_ops ->read_skb()
Currently both splice() and sockmap use ->read_sock() to read skb from receive queue, but for sockmap we only read one entire skb at a time, so ->read_soc
net: Introduce a new proto_ops ->read_skb()
Currently both splice() and sockmap use ->read_sock() to read skb from receive queue, but for sockmap we only read one entire skb at a time, so ->read_sock() is too conservative to use. Introduce a new proto_ops ->read_skb() which supports this sematic, with this we can finally pass the ownership of skb to recv actors.
For non-TCP protocols, all ->read_sock() can be simply converted to ->read_skb().
Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220615162014.89193-3-xiyou.wangcong@gmail.com
show more ...
|
#
0defbb0a |
| 09-Jun-2022 |
Eric Dumazet <edumazet@google.com> |
net: add per_cpu_fw_alloc field to struct proto
Each protocol having a ->memory_allocated pointer gets a corresponding per-cpu reserve, that following patches will use.
Instead of having reserved b
net: add per_cpu_fw_alloc field to struct proto
Each protocol having a ->memory_allocated pointer gets a corresponding per-cpu reserve, that following patches will use.
Instead of having reserved bytes per socket, we want to have per-cpu reserves.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
ec095263 |
| 11-Apr-2022 |
Oliver Hartkopp <socketcan@hartkopp.net> |
net: remove noblock parameter from recvmsg() entities
The internal recvmsg() functions have two parameters 'flags' and 'noblock' that were merged inside skb_recv_datagram(). As a follow up patch to
net: remove noblock parameter from recvmsg() entities
The internal recvmsg() functions have two parameters 'flags' and 'noblock' that were merged inside skb_recv_datagram(). As a follow up patch to commit f4b41f062c42 ("net: remove noblock parameter from skb_recv_datagram()") this patch removes the separate 'noblock' parameter for recvmsg().
Analogue to the referenced patch for skb_recv_datagram() the 'flags' and 'noblock' parameters are unnecessarily split up with e.g.
err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT, flags & ~MSG_DONTWAIT, &addr_len);
or in
err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udp_recvmsg, sk, msg, size, flags & MSG_DONTWAIT, flags & ~MSG_DONTWAIT, &addr_len);
instead of simply using only flags all the time and check for MSG_DONTWAIT where needed (to preserve for the formerly separated no(n)block condition).
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/r/20220411124955.154876-1-socketcan@hartkopp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
4721031c |
| 15-Nov-2021 |
Eric Dumazet <edumazet@google.com> |
net: move gro definitions to include/net/gro.h
include/linux/netdevice.h became too big, move gro stuff into include/net/gro.h
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David
net: move gro definitions to include/net/gro.h
include/linux/netdevice.h became too big, move gro stuff into include/net/gro.h
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
9122a70a |
| 24-Oct-2021 |
Cyril Strejc <cyril.strejc@skoda.cz> |
net: multicast: calculate csum of looped-back and forwarded packets
During a testing of an user-space application which transmits UDP multicast datagrams and utilizes multicast routing to send the U
net: multicast: calculate csum of looped-back and forwarded packets
During a testing of an user-space application which transmits UDP multicast datagrams and utilizes multicast routing to send the UDP datagrams out of defined network interfaces, I've found a multicast router does not fill-in UDP checksum into locally produced, looped-back and forwarded UDP datagrams, if an original output NIC the datagrams are sent to has UDP TX checksum offload enabled.
The datagrams are sent malformed out of the NIC the datagrams have been forwarded to.
It is because:
1. If TX checksum offload is enabled on the output NIC, UDP checksum is not calculated by kernel and is not filled into skb data.
2. dev_loopback_xmit(), which is called solely by ip_mc_finish_output(), sets skb->ip_summed = CHECKSUM_UNNECESSARY unconditionally.
3. Since 35fc92a9 ("[NET]: Allow forwarding of ip_summed except CHECKSUM_COMPLETE"), the ip_summed value is preserved during forwarding.
4. If ip_summed != CHECKSUM_PARTIAL, checksum is not calculated during a packet egress.
The minimum fix in dev_loopback_xmit():
1. Preserves skb->ip_summed CHECKSUM_PARTIAL. This is the case when the original output NIC has TX checksum offload enabled. The effects are:
a) If the forwarding destination interface supports TX checksum offloading, the NIC driver is responsible to fill-in the checksum.
b) If the forwarding destination interface does NOT support TX checksum offloading, checksums are filled-in by kernel before skb is submitted to the NIC driver.
c) For local delivery, checksum validation is skipped as in the case of CHECKSUM_UNNECESSARY, thanks to skb_csum_unnecessary().
2. Translates ip_summed CHECKSUM_NONE to CHECKSUM_UNNECESSARY. It means, for CHECKSUM_NONE, the behavior is unmodified and is there to skip a looped-back packet local delivery checksum validation.
Signed-off-by: Cyril Strejc <cyril.strejc@skoda.cz> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
51e0158a |
| 07-Apr-2021 |
Cong Wang <cong.wang@bytedance.com> |
skmsg: Pass psock pointer to ->psock_update_sk_prot()
Using sk_psock() to retrieve psock pointer from sock requires RCU read lock, but we already get psock pointer before calling ->psock_update_sk_p
skmsg: Pass psock pointer to ->psock_update_sk_prot()
Using sk_psock() to retrieve psock pointer from sock requires RCU read lock, but we already get psock pointer before calling ->psock_update_sk_prot() in both cases, so we can just pass it without bothering sk_psock().
Fixes: 8a59f9d1e3d4 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()") Reported-by: syzbot+320a3bc8d80f478c37e4@syzkaller.appspotmail.com Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: syzbot+320a3bc8d80f478c37e4@syzkaller.appspotmail.com Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210407032111.33398-1-xiyou.wangcong@gmail.com
show more ...
|
#
d7f57118 |
| 31-Mar-2021 |
Cong Wang <cong.wang@bytedance.com> |
udp: Implement ->read_sock() for sockmap
This is similar to tcp_read_sock(), except we do not need to worry about connections, we just need to retrieve skb from UDP receive queue.
Note, the return
udp: Implement ->read_sock() for sockmap
This is similar to tcp_read_sock(), except we do not need to worry about connections, we just need to retrieve skb from UDP receive queue.
Note, the return value of ->read_sock() is unused in sk_psock_verdict_data_ready(), and UDP still does not support splice() due to lack of ->splice_read(), so users can not reach udp_read_sock() directly.
Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-12-xiyou.wangcong@gmail.com
show more ...
|
#
8a59f9d1 |
| 31-Mar-2021 |
Cong Wang <cong.wang@bytedance.com> |
sock: Introduce sk->sk_prot->psock_update_sk_prot()
Currently sockmap calls into each protocol to update the struct proto and replace it. This certainly won't work when the protocol is implemented a
sock: Introduce sk->sk_prot->psock_update_sk_prot()
Currently sockmap calls into each protocol to update the struct proto and replace it. This certainly won't work when the protocol is implemented as a module, for example, AF_UNIX.
Introduce a new ops sk->sk_prot->psock_update_sk_prot(), so each protocol can implement its own way to replace the struct proto. This also helps get rid of symbol dependencies on CONFIG_INET.
Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210331023237.41094-11-xiyou.wangcong@gmail.com
show more ...
|
#
000ac44d |
| 30-Mar-2021 |
Paolo Abeni <pabeni@redhat.com> |
udp: fixup csum for GSO receive slow path
When UDP packets generated locally by a socket with UDP_SEGMENT traverse the following path:
UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) -> UDP
udp: fixup csum for GSO receive slow path
When UDP packets generated locally by a socket with UDP_SEGMENT traverse the following path:
UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) -> UDP tunnel (rx) -> UDP socket (no UDP_GRO)
ip_summed will be set to CHECKSUM_PARTIAL at creation time and such checksum mode will be preserved in the above path up to the UDP tunnel receive code where we have:
__iptunnel_pull_header() -> skb_pull_rcsum() -> skb_postpull_rcsum() -> __skb_postpull_rcsum()
The latter will convert the skb to CHECKSUM_NONE.
The UDP GSO packet will be later segmented as part of the rx socket receive operation, and will present a CHECKSUM_NONE after segmentation.
Additionally the segmented packets UDP CB still refers to the original GSO packet len. Overall that causes unexpected/wrong csum validation errors later in the UDP receive path.
We could possibly address the issue with some additional checks and csum mangling in the UDP tunnel code. Since the issue affects only this UDP receive slow path, let's set a suitable csum status there.
Note that SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST packets lacking an UDP encapsulation present a valid checksum when landing to udp_queue_rcv_skb(), as the UDP checksum has been validated by the GRO engine.
v2 -> v3: - even more verbose commit message and comments
v1 -> v2: - restrict the csum update to the packets strictly needing them - hopefully clarify the commit message and code comments
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
88759609 |
| 23-Feb-2021 |
Cong Wang <cong.wang@bytedance.com> |
bpf: Clean up sockmap related Kconfigs
As suggested by John, clean up sockmap related Kconfigs:
Reduce the scope of CONFIG_BPF_STREAM_PARSER down to TCP stream parser, to reflect its name.
Make th
bpf: Clean up sockmap related Kconfigs
As suggested by John, clean up sockmap related Kconfigs:
Reduce the scope of CONFIG_BPF_STREAM_PARSER down to TCP stream parser, to reflect its name.
Make the rest sockmap code simply depend on CONFIG_BPF_SYSCALL and CONFIG_INET, the latter is still needed at this point because of TCP/UDP proto update. And leave CONFIG_NET_SOCK_MSG untouched, as it is used by non-sockmap cases.
Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210223184934.6054-2-xiyou.wangcong@gmail.com
show more ...
|
#
f9a4719c |
| 03-Feb-2021 |
Leon Romanovsky <leonro@nvidia.com> |
ipv6: move udp declarations to net/udp.h
Fix the following compilation warning:
net/ipv6/udp.c:1031:30: warning: no previous prototype for 'udp_v6_early_demux' [-Wmissing-prototypes] 1031 | INDIRE
ipv6: move udp declarations to net/udp.h
Fix the following compilation warning:
net/ipv6/udp.c:1031:30: warning: no previous prototype for 'udp_v6_early_demux' [-Wmissing-prototypes] 1031 | INDIRECT_CALLABLE_SCOPE void udp_v6_early_demux(struct sk_buff *skb) | ^~~~~~~~~~~~~~~~~~ net/ipv6/udp.c:1072:29: warning: no previous prototype for 'udpv6_rcv' [-Wmissing-prototypes] 1072 | INDIRECT_CALLABLE_SCOPE int udpv6_rcv(struct sk_buff *skb) | ^~~~~~~~~
Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
a4a600dd |
| 03-Feb-2021 |
Xin Long <lucien.xin@gmail.com> |
udp: call udp_encap_enable for v6 sockets when enabling encap
When enabling encap for a ipv6 socket without udp_encap_needed_key increased, UDP GRO won't work for v4 mapped v6 address packets as sk
udp: call udp_encap_enable for v6 sockets when enabling encap
When enabling encap for a ipv6 socket without udp_encap_needed_key increased, UDP GRO won't work for v4 mapped v6 address packets as sk will be NULL in udp4_gro_receive().
This patch is to enable it by increasing udp_encap_needed_key for v6 sockets in udp_tunnel_encap_enable(), and correspondingly decrease udp_encap_needed_key in udpv6_destroy_sock().
v1->v2: - add udp_encap_disable() and export it. v2->v3: - add the change for rxrpc and bareudp into one patch, as Alex suggested. v3->v4: - move rxrpc part to another patch.
Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
c3df39ac |
| 29-Jan-2021 |
Dongseok Yi <dseok.yi@samsung.com> |
udp: ipv4: manipulate network header of NATed UDP GRO fraglist
UDP/IP header of UDP GROed frag_skbs are not updated even after NAT forwarding. Only the header of head_skb from ip_finish_output_gso -
udp: ipv4: manipulate network header of NATed UDP GRO fraglist
UDP/IP header of UDP GROed frag_skbs are not updated even after NAT forwarding. Only the header of head_skb from ip_finish_output_gso -> skb_gso_segment is updated but following frag_skbs are not updated.
A call path skb_mac_gso_segment -> inet_gso_segment -> udp4_ufo_fragment -> __udp_gso_segment -> __udp_gso_segment_list does not try to update UDP/IP header of the segment list but copy only the MAC header.
Update port, addr and check of each skb of the segment list in __udp_gso_segment_list. It covers both SNAT and DNAT.
Fixes: 9fd1ff5d2ac7 (udp: Support UDP fraglist GRO/GSO.) Signed-off-by: Dongseok Yi <dseok.yi@samsung.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Link: https://lore.kernel.org/r/1611962007-80092-1-git-send-email-dseok.yi@samsung.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
7b58e63e |
| 09-Nov-2020 |
Eric Dumazet <edumazet@google.com> |
inet: udp{4|6}_lib_lookup_skb() skb argument is const
The skb is needed only to fetch the keys for the lookup.
Both functions are used from GRO stack, we do not want accidental modification of the
inet: udp{4|6}_lib_lookup_skb() skb argument is const
The skb is needed only to fetch the keys for the lookup.
Both functions are used from GRO stack, we do not want accidental modification of the skb.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|