#
6e063120 |
| 25-Mar-2024 |
Eric Dumazet <edumazet@google.com> |
net: remove skb_free_datagram_locked()
Last user of skb_free_datagram_locked() went away in 2016 with commit 850cbaddb52d ("udp: use it's own memory accounting schema").
Signed-off-by: Eric Dumazet
net: remove skb_free_datagram_locked()
Last user of skb_free_datagram_locked() went away in 2016 with commit 850cbaddb52d ("udp: use it's own memory accounting schema").
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240325134155.620531-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
fe92f874 |
| 31-Jan-2024 |
Michael Lass <bevan@bi-co.net> |
net: Fix from address in memcpy_to_iter_csum()
While inlining csum_and_memcpy() into memcpy_to_iter_csum(), the from address passed to csum_partial_copy_nocheck() was accidentally changed. This caus
net: Fix from address in memcpy_to_iter_csum()
While inlining csum_and_memcpy() into memcpy_to_iter_csum(), the from address passed to csum_partial_copy_nocheck() was accidentally changed. This causes a regression in applications using UDP, as for example OpenAFS, causing loss of datagrams.
Fixes: dc32bff195b4 ("iov_iter, net: Fold in csum_and_memcpy()") Cc: David Howells <dhowells@redhat.com> Cc: stable@vger.kernel.org Cc: regressions@lists.linux.dev Signed-off-by: Michael Lass <bevan@bi-co.net> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
b5f0e20f |
| 25-Sep-2023 |
David Howells <dhowells@redhat.com> |
iov_iter, net: Move hash_and_copy_to_iter() to net/
Move hash_and_copy_to_iter() to be with its only caller in networking code.
Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore
iov_iter, net: Move hash_and_copy_to_iter() to net/
Move hash_and_copy_to_iter() to be with its only caller in networking code.
Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230925120309.1731676-13-dhowells@redhat.com cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Christian Brauner <christian@brauner.io> cc: Matthew Wilcox <willy@infradead.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: David Laight <David.Laight@ACULAB.COM> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org cc: netdev@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
7c6f353e |
| 25-Sep-2023 |
David Howells <dhowells@redhat.com> |
iov_iter, net: Merge csum_and_copy_from_iter{,_full}() together
Move csum_and_copy_from_iter_full() out of line and then merge csum_and_copy_from_iter() into its only caller.
Signed-off-by: David H
iov_iter, net: Merge csum_and_copy_from_iter{,_full}() together
Move csum_and_copy_from_iter_full() out of line and then merge csum_and_copy_from_iter() into its only caller.
Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230925120309.1731676-12-dhowells@redhat.com cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Christian Brauner <christian@brauner.io> cc: Matthew Wilcox <willy@infradead.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: David Laight <David.Laight@ACULAB.COM> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org cc: netdev@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
dc32bff1 |
| 25-Sep-2023 |
David Howells <dhowells@redhat.com> |
iov_iter, net: Fold in csum_and_memcpy()
Fold csum_and_memcpy() in to its callers.
Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230925120309.1731676-11-dhowe
iov_iter, net: Fold in csum_and_memcpy()
Fold csum_and_memcpy() in to its callers.
Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230925120309.1731676-11-dhowells@redhat.com cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Christian Brauner <christian@brauner.io> cc: Matthew Wilcox <willy@infradead.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: David Laight <David.Laight@ACULAB.COM> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org cc: netdev@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
6d0d4199 |
| 25-Sep-2023 |
David Howells <dhowells@redhat.com> |
iov_iter, net: Move csum_and_copy_to/from_iter() to net/
Move csum_and_copy_to/from_iter() to net code now that the iteration framework can be #included.
Signed-off-by: David Howells <dhowells@redh
iov_iter, net: Move csum_and_copy_to/from_iter() to net/
Move csum_and_copy_to/from_iter() to net code now that the iteration framework can be #included.
Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230925120309.1731676-10-dhowells@redhat.com cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Christoph Hellwig <hch@lst.de> cc: Christian Brauner <christian@brauner.io> cc: Matthew Wilcox <willy@infradead.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: David Laight <David.Laight@ACULAB.COM> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org cc: netdev@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
5bca1d08 |
| 09-May-2023 |
Eric Dumazet <edumazet@google.com> |
net: datagram: fix data-races in datagram_poll()
datagram_poll() runs locklessly, we should add READ_ONCE() annotations while reading sk->sk_err, sk->sk_shutdown and sk->sk_state.
Fixes: 1da177e4c3
net: datagram: fix data-races in datagram_poll()
datagram_poll() runs locklessly, we should add READ_ONCE() annotations while reading sk->sk_err, sk->sk_shutdown and sk->sk_state.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230509173131.3263780-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
593ef60c |
| 21-Mar-2023 |
Xiaoyan Li <lixiaoyan@google.com> |
net-zerocopy: Reduce compound page head access
When compound pages are enabled, although the mm layer still returns an array of page pointers, a subset (or all) of them may have the same page head s
net-zerocopy: Reduce compound page head access
When compound pages are enabled, although the mm layer still returns an array of page pointers, a subset (or all) of them may have the same page head since a max 180kb skb can span 2 hugepages if it is on the boundary, be a mix of pages and 1 hugepage, or fit completely in a hugepage. Instead of referencing page head on all page pointers, use page length arithmetic to only call page head when referencing a known different page head to avoid touching a cold cacheline.
Tested: See next patch with changes to tcp_mmap
Correntess: On a pair of separate hosts as send with MSG_ZEROCOPY will force a copy on tx if using loopback alone, check that the SHA on the message sent is equivalent to checksum on the message received, since the current program already checks for the length.
echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages ./tcp_mmap -s -z ./tcp_mmap -H $DADDR -z
SHA256 is correct received 2 MB (100 % mmap'ed) in 0.005914 s, 2.83686 Gbit cpu usage user:0.001984 sys:0.000963, 1473.5 usec per MB, 10 c-switches
Performance: Run neper between adjacent hosts with the same config tcp_stream -Z --skip-rx-copy -6 -T 20 -F 1000 --stime-use-proc --test-length=30
Before patch: stime_end=37.670000 After patch: stime_end=30.310000
Signed-off-by: Coco Li <lixiaoyan@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230321081202.2370275-1-lixiaoyan@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
32614006 |
| 31-Aug-2022 |
Eric Dumazet <edumazet@google.com> |
tcp: TX zerocopy should not sense pfmemalloc status
We got a recent syzbot report [1] showing a possible misuse of pfmemalloc page status in TCP zerocopy paths.
Indeed, for pages coming from user s
tcp: TX zerocopy should not sense pfmemalloc status
We got a recent syzbot report [1] showing a possible misuse of pfmemalloc page status in TCP zerocopy paths.
Indeed, for pages coming from user space or other layers, using page_is_pfmemalloc() is moot, and possibly could give false positives.
There has been attempts to make page_is_pfmemalloc() more robust, but not using it in the first place in this context is probably better, removing cpu cycles.
Note to stable teams :
You need to backport 84ce071e38a6 ("net: introduce __skb_fill_page_desc_noacc") as a prereq.
Race is more probable after commit c07aea3ef4d4 ("mm: add a signature in struct page") because page_is_pfmemalloc() is now using low order bit from page->lru.next, which can change more often than page->index.
Low order bit should never be set for lru.next (when used as an anchor in LRU list), so KCSAN report is mostly a false positive.
Backporting to older kernel versions seems not necessary.
[1] BUG: KCSAN: data-race in lru_add_fn / tcp_build_frag
write to 0xffffea0004a1d2c8 of 8 bytes by task 18600 on cpu 0: __list_add include/linux/list.h:73 [inline] list_add include/linux/list.h:88 [inline] lruvec_add_folio include/linux/mm_inline.h:105 [inline] lru_add_fn+0x440/0x520 mm/swap.c:228 folio_batch_move_lru+0x1e1/0x2a0 mm/swap.c:246 folio_batch_add_and_move mm/swap.c:263 [inline] folio_add_lru+0xf1/0x140 mm/swap.c:490 filemap_add_folio+0xf8/0x150 mm/filemap.c:948 __filemap_get_folio+0x510/0x6d0 mm/filemap.c:1981 pagecache_get_page+0x26/0x190 mm/folio-compat.c:104 grab_cache_page_write_begin+0x2a/0x30 mm/folio-compat.c:116 ext4_da_write_begin+0x2dd/0x5f0 fs/ext4/inode.c:2988 generic_perform_write+0x1d4/0x3f0 mm/filemap.c:3738 ext4_buffered_write_iter+0x235/0x3e0 fs/ext4/file.c:270 ext4_file_write_iter+0x2e3/0x1210 call_write_iter include/linux/fs.h:2187 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x468/0x760 fs/read_write.c:578 ksys_write+0xe8/0x1a0 fs/read_write.c:631 __do_sys_write fs/read_write.c:643 [inline] __se_sys_write fs/read_write.c:640 [inline] __x64_sys_write+0x3e/0x50 fs/read_write.c:640 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffffea0004a1d2c8 of 8 bytes by task 18611 on cpu 1: page_is_pfmemalloc include/linux/mm.h:1740 [inline] __skb_fill_page_desc include/linux/skbuff.h:2422 [inline] skb_fill_page_desc include/linux/skbuff.h:2443 [inline] tcp_build_frag+0x613/0xb20 net/ipv4/tcp.c:1018 do_tcp_sendpages+0x3e8/0xaf0 net/ipv4/tcp.c:1075 tcp_sendpage_locked net/ipv4/tcp.c:1140 [inline] tcp_sendpage+0x89/0xb0 net/ipv4/tcp.c:1150 inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833 kernel_sendpage+0x184/0x300 net/socket.c:3561 sock_sendpage+0x5a/0x70 net/socket.c:1054 pipe_to_sendpage+0x128/0x160 fs/splice.c:361 splice_from_pipe_feed fs/splice.c:415 [inline] __splice_from_pipe+0x222/0x4d0 fs/splice.c:559 splice_from_pipe fs/splice.c:594 [inline] generic_splice_sendpage+0x89/0xc0 fs/splice.c:743 do_splice_from fs/splice.c:764 [inline] direct_splice_actor+0x80/0xa0 fs/splice.c:931 splice_direct_to_actor+0x305/0x620 fs/splice.c:886 do_splice_direct+0xfb/0x180 fs/splice.c:974 do_sendfile+0x3bf/0x910 fs/read_write.c:1249 __do_sys_sendfile64 fs/read_write.c:1317 [inline] __se_sys_sendfile64 fs/read_write.c:1303 [inline] __x64_sys_sendfile64+0x10c/0x150 fs/read_write.c:1303 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x0000000000000000 -> 0xffffea0004a1d288
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 18611 Comm: syz-executor.4 Not tainted 6.0.0-rc2-syzkaller-00248-ge022620b5d05-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
Fixes: c07aea3ef4d4 ("mm: add a signature in struct page") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Shakeel Butt <shakeelb@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
1ef255e2 |
| 09-Jun-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()
Most of the users immediately follow successful iov_iter_get_pages() with advancing by the amount it had returned.
Provide inline wrapp
iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()
Most of the users immediately follow successful iov_iter_get_pages() with advancing by the amount it had returned.
Provide inline wrappers doing that, convert trivial open-coded uses of those.
BTW, iov_iter_get_pages() never returns more than it had been asked to; such checks in cifs ought to be removed someday...
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
2829a267 |
| 21-Jul-2022 |
Pavel Begunkov <asml.silence@gmail.com> |
net: fix uninitialised msghdr->sg_from_iter
Because of how struct msghdr is usually initialised some fields and sg_from_iter in particular might be left out not initialised, so we can't safely use i
net: fix uninitialised msghdr->sg_from_iter
Because of how struct msghdr is usually initialised some fields and sg_from_iter in particular might be left out not initialised, so we can't safely use it in __zerocopy_sg_from_iter().
For now use the callback only when there is ->msg_ubuf set relying on the fact that they're used together and we properly zero ->msg_ubuf.
Fixes: ebe73a284f4de8 ("net: Allow custom iter handler in msghdr") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Message-Id: <ce8b68b41351488f79fd998b032b3c56e9b1cc6c.1658401817.git.asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
ebe73a28 |
| 12-Jul-2022 |
David Ahern <dsahern@kernel.org> |
net: Allow custom iter handler in msghdr
Add support for custom iov_iter handling to msghdr. The idea is that in-kernel subsystems want control over how an SG is split.
Signed-off-by: David Ahern <
net: Allow custom iter handler in msghdr
Add support for custom iov_iter handling to msghdr. The idea is that in-kernel subsystems want control over how an SG is split.
Signed-off-by: David Ahern <dsahern@kernel.org> [pavel: move callback into msghdr] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
4890b686 |
| 09-Jun-2022 |
Eric Dumazet <edumazet@google.com> |
net: keep sk->sk_forward_alloc as small as possible
Currently, tcp_memory_allocated can hit tcp_mem[] limits quite fast.
Each TCP socket can forward allocate up to 2 MB of memory, even after flow b
net: keep sk->sk_forward_alloc as small as possible
Currently, tcp_memory_allocated can hit tcp_mem[] limits quite fast.
Each TCP socket can forward allocate up to 2 MB of memory, even after flow became less active.
10,000 sockets can have reserved 20 GB of memory, and we have no shrinker in place to reclaim that.
Instead of trying to reclaim the extra allocations in some places, just keep sk->sk_forward_alloc values as small as possible.
This should not impact performance too much now we have per-cpu reserves: Changes to tcp_memory_allocated should not be too frequent.
For sockets not using SO_RESERVE_MEM: - idle sockets (no packets in tx/rx queues) have zero forward alloc. - non idle sockets have a forward alloc smaller than one page.
Note:
- Removal of SK_RECLAIM_CHUNK and SK_RECLAIM_THRESHOLD is left to MPTCP maintainers as a follow up.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
657dd5f9 |
| 28-Apr-2022 |
Pavel Begunkov <asml.silence@gmail.com> |
net: inline skb_zerocopy_iter_dgram
skb_zerocopy_iter_dgram() is a small proxy function, inline it. For that, move __zerocopy_sg_from_iter into linux/skbuff.h
Signed-off-by: Pavel Begunkov <asml.si
net: inline skb_zerocopy_iter_dgram
skb_zerocopy_iter_dgram() is a small proxy function, inline it. For that, move __zerocopy_sg_from_iter into linux/skbuff.h
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
f4b41f06 |
| 04-Apr-2022 |
Oliver Hartkopp <socketcan@hartkopp.net> |
net: remove noblock parameter from skb_recv_datagram()
skb_recv_datagram() has two parameters 'flags' and 'noblock' that are merged inside skb_recv_datagram() by 'flags | (noblock ? MSG_DONTWAIT : 0
net: remove noblock parameter from skb_recv_datagram()
skb_recv_datagram() has two parameters 'flags' and 'noblock' that are merged inside skb_recv_datagram() by 'flags | (noblock ? MSG_DONTWAIT : 0)'
As 'flags' may contain MSG_DONTWAIT as value most callers split the 'flags' into 'flags' and 'noblock' with finally obsolete bit operations like this:
skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &rc);
And this is not even done consistently with the 'flags' parameter.
This patch removes the obsolete and costly splitting into two parameters and only performs bit operations when really needed on the caller side.
One missing conversion thankfully reported by kernel test robot. I missed to enable kunit tests to build the mctp code.
Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
9b65b17d |
| 03-Nov-2021 |
Talal Ahmad <talalahmad@google.com> |
net: avoid double accounting for pure zerocopy skbs
Track skbs containing only zerocopy data and avoid charging them to kernel memory to correctly account the memory utilization for msg_zerocopy. Al
net: avoid double accounting for pure zerocopy skbs
Track skbs containing only zerocopy data and avoid charging them to kernel memory to correctly account the memory utilization for msg_zerocopy. All of the data in such skbs is held in user pages which are already accounted to user. Before this change, they are charged again in kernel in __zerocopy_sg_from_iter. The charging in kernel is excessive because data is not being copied into skb frags. This excessive charging can lead to kernel going into memory pressure state which impacts all sockets in the system adversely. Mark pure zerocopy skbs with a SKBFL_PURE_ZEROCOPY flag and remove charge/uncharge for data in such skbs.
Initially, an skb is marked pure zerocopy when it is empty and in zerocopy path. skb can then change from a pure zerocopy skb to mixed data skb (zerocopy and copy data) if it is at tail of write queue and there is room available in it and non-zerocopy data is being sent in the next sendmsg call. At this time sk_mem_charge is done for the pure zerocopied data and the pure zerocopy flag is unmarked. We found that this happens very rarely on workloads that pass MSG_ZEROCOPY.
A pure zerocopy skb can later be coalesced into normal skb if they are next to each other in queue but this patch prevents coalescing from happening. This avoids complexity of charging when skb downgrades from pure zerocopy to mixed. This is also rare.
In sk_wmem_free_skb, if it is a pure zerocopy skb, an sk_mem_uncharge for SKB_TRUESIZE(skb_end_offset(skb)) is done for sk_mem_charge in tcp_skb_entail for an skb without data.
Testing with the msg_zerocopy.c benchmark between two hosts(100G nics) with zerocopy showed that before this patch the 'sock' variable in memory.stat for cgroup2 that tracks sum of sk_forward_alloc, sk_rmem_alloc and sk_wmem_queued is around 1822720 and with this change it is 0. This is due to no charge to sk_forward_alloc for zerocopy data and shows memory utilization for kernel is lowered.
With this commit we don't see the warning we saw in previous commit which resulted in commit 84882cf72cd774cf16fd338bdbf00f69ac9f9194.
Signed-off-by: Talal Ahmad <talalahmad@google.com> Acked-by: Arjun Roy <arjunroy@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
52cbd23a |
| 03-Feb-2021 |
Willem de Bruijn <willemb@google.com> |
udp: fix skb_copy_and_csum_datagram with odd segment sizes
When iteratively computing a checksum with csum_block_add, track the offset "pos" to correctly rotate in csum_block_add when offset is odd.
udp: fix skb_copy_and_csum_datagram with odd segment sizes
When iteratively computing a checksum with csum_block_add, track the offset "pos" to correctly rotate in csum_block_add when offset is odd.
The open coded implementation of skb_copy_and_csum_datagram did this. With the switch to __skb_datagram_iter calling csum_and_copy_to_iter, pos was reinitialized to 0 on each call.
Bring back the pos by passing it along with the csum to the callback.
Changes v1->v2 - pass csum value, instead of csump pointer (Alexander Duyck)
Link: https://lore.kernel.org/netdev/20210128152353.GB27281@optiplex/ Fixes: 950fcaecd5cc ("datagram: consolidate datagram copy to iter helpers") Reported-by: Oliver Graute <oliver.graute@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210203192952.1849843-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
c1639be9 |
| 16-Nov-2020 |
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> |
net: datagram: fix some kernel-doc markups
Some identifiers have different names between their prototypes and the kernel-doc markup.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
net: datagram: fix some kernel-doc markups
Some identifiers have different names between their prototypes and the kernel-doc markup.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
394fcd8a |
| 20-Aug-2020 |
Eric Dumazet <edumazet@google.com> |
net: zerocopy: combine pages in zerocopy_sg_from_iter()
Currently, tcp sendmsg(MSG_ZEROCOPY) is building skbs with order-0 fragments. Compared to standard sendmsg(), these skbs usually contain up to
net: zerocopy: combine pages in zerocopy_sg_from_iter()
Currently, tcp sendmsg(MSG_ZEROCOPY) is building skbs with order-0 fragments. Compared to standard sendmsg(), these skbs usually contain up to 16 fragments on arches with 4KB page sizes, instead of two.
This adds considerable costs on various ndo_start_xmit() handlers, especially when IOMMU is in the picture.
As high performance applications are often using huge pages, we can try to combine adjacent pages belonging to same compound page.
Tested on AMD Rome platform, with IOMMU, nominal single TCP flow speed is roughly doubled (~55Gbit -> ~100Gbit), when user application is using hugepages.
For reference, nominal single TCP flow speed on this platform without MSG_ZEROCOPY is ~65Gbit.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
29f3490b |
| 25-Mar-2020 |
Eric Dumazet <edumazet@google.com> |
net: use indirect call wrappers for skb_copy_datagram_iter()
TCP recvmsg() calls skb_copy_datagram_iter(), which calls an indirect function (cb pointing to simple_copy_to_iter()) for every MSS (frag
net: use indirect call wrappers for skb_copy_datagram_iter()
TCP recvmsg() calls skb_copy_datagram_iter(), which calls an indirect function (cb pointing to simple_copy_to_iter()) for every MSS (fragment) present in the skb.
CONFIG_RETPOLINE=y forces a very expensive operation that we can avoid thanks to indirect call wrappers.
This patch gives a 13% increase of performance on a single flow, if the bottleneck is the thread reading the TCP socket.
Fixes: 950fcaecd5cc ("datagram: consolidate datagram copy to iter helpers") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
e427cad6 |
| 28-Feb-2020 |
Paolo Abeni <pabeni@redhat.com> |
net: datagram: drop 'destructor' argument from several helpers
The only users for such argument are the UDP protocol and the UNIX socket family. We can safely reclaim the accounted memory directly f
net: datagram: drop 'destructor' argument from several helpers
The only users for such argument are the UDP protocol and the UNIX socket family. We can safely reclaim the accounted memory directly from the UDP code and, after the previous patch, we can do scm stats accounting outside the datagram helpers.
Overall this cleans up a bit some datagram-related helpers, and avoids an indirect call per packet in the UDP receive path.
v1 -> v2: - call scm_stat_del() only when not peeking - Kirill - fix build issue with CONFIG_INET_ESPINTCP
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
b50b0580 |
| 25-Nov-2019 |
Sabrina Dubroca <sd@queasysnail.net> |
net: add queue argument to __skb_wait_for_more_packets and __skb_{,try_}recv_datagram
This will be used by ESP over TCP to handle the queue of IKE messages.
Signed-off-by: Sabrina Dubroca <sd@queas
net: add queue argument to __skb_wait_for_more_packets and __skb_{,try_}recv_datagram
This will be used by ESP over TCP to handle the queue of IKE messages.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
show more ...
|
#
7c422d0c |
| 24-Oct-2019 |
Eric Dumazet <edumazet@google.com> |
net: add READ_ONCE() annotation in __skb_wait_for_more_packets()
__skb_wait_for_more_packets() can be called while other cpus can feed packets to the socket receive queue.
KCSAN reported :
BUG: KC
net: add READ_ONCE() annotation in __skb_wait_for_more_packets()
__skb_wait_for_more_packets() can be called while other cpus can feed packets to the socket receive queue.
KCSAN reported :
BUG: KCSAN: data-race in __skb_wait_for_more_packets / __udp_enqueue_schedule_skb
write to 0xffff888102e40b58 of 8 bytes by interrupt on cpu 0: __skb_insert include/linux/skbuff.h:1852 [inline] __skb_queue_before include/linux/skbuff.h:1958 [inline] __skb_queue_tail include/linux/skbuff.h:1991 [inline] __udp_enqueue_schedule_skb+0x2d7/0x410 net/ipv4/udp.c:1470 __udp_queue_rcv_skb net/ipv4/udp.c:1940 [inline] udp_queue_rcv_one_skb+0x7bd/0xc70 net/ipv4/udp.c:2057 udp_queue_rcv_skb+0xb5/0x400 net/ipv4/udp.c:2074 udp_unicast_rcv_skb.isra.0+0x7e/0x1c0 net/ipv4/udp.c:2233 __udp4_lib_rcv+0xa44/0x17c0 net/ipv4/udp.c:2300 udp_rcv+0x2b/0x40 net/ipv4/udp.c:2470 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:442 [inline] ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124 process_backlog+0x1d3/0x420 net/core/dev.c:5955
read to 0xffff888102e40b58 of 8 bytes by task 13035 on cpu 1: __skb_wait_for_more_packets+0xfa/0x320 net/core/datagram.c:100 __skb_recv_udp+0x374/0x500 net/ipv4/udp.c:1683 udp_recvmsg+0xe1/0xb10 net/ipv4/udp.c:1712 inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838 sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871 ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480 do_recvmmsg+0x19a/0x5c0 net/socket.c:2601 __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680 __do_sys_recvmmsg net/socket.c:2703 [inline] __se_sys_recvmmsg net/socket.c:2696 [inline] __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 13035 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
3f926af3 |
| 24-Oct-2019 |
Eric Dumazet <edumazet@google.com> |
net: use skb_queue_empty_lockless() in busy poll contexts
Busy polling usually runs without locks. Let's use skb_queue_empty_lockless() instead of skb_queue_empty()
Also uses READ_ONCE() in __skb_t
net: use skb_queue_empty_lockless() in busy poll contexts
Busy polling usually runs without locks. Let's use skb_queue_empty_lockless() instead of skb_queue_empty()
Also uses READ_ONCE() in __skb_try_recv_datagram() to address a similar potential problem.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
3ef7cf57 |
| 24-Oct-2019 |
Eric Dumazet <edumazet@google.com> |
net: use skb_queue_empty_lockless() in poll() handlers
Many poll() handlers are lockless. Using skb_queue_empty_lockless() instead of skb_queue_empty() is more appropriate.
Signed-off-by: Eric Duma
net: use skb_queue_empty_lockless() in poll() handlers
Many poll() handlers are lockless. Using skb_queue_empty_lockless() instead of skb_queue_empty() is more appropriate.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|