#
9633e937 |
| 03-Jun-2024 |
Jason Xing <kernelxing@tencent.com> |
mptcp: count CLOSE-WAIT sockets for MPTCP_MIB_CURRESTAB
Like previous patch does in TCP, we need to adhere to RFC 1213:
"tcpCurrEstab OBJECT-TYPE ... The number of TCP connections for which
mptcp: count CLOSE-WAIT sockets for MPTCP_MIB_CURRESTAB
Like previous patch does in TCP, we need to adhere to RFC 1213:
"tcpCurrEstab OBJECT-TYPE ... The number of TCP connections for which the current state is either ESTABLISHED or CLOSE- WAIT."
So let's consider CLOSE-WAIT sockets.
The logic of counting When we increment the counter? a) Only if we change the state to ESTABLISHED.
When we decrement the counter? a) if the socket leaves ESTABLISHED and will never go into CLOSE-WAIT, say, on the client side, changing from ESTABLISHED to FIN-WAIT-1. b) if the socket leaves CLOSE-WAIT, say, on the server side, changing from CLOSE-WAIT to LAST-ACK.
Fixes: d9cd27b8cd19 ("mptcp: add CurrEstab MIB counter support") Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
5eae7a82 |
| 14-May-2024 |
Matthieu Baerts (NGI0) <matttbe@kernel.org> |
mptcp: prefer strscpy over strcpy
strcpy() performs no bounds checking on the destination buffer. This could result in linear overflows beyond the end of the buffer, leading to all kinds of misbehav
mptcp: prefer strscpy over strcpy
strcpy() performs no bounds checking on the destination buffer. This could result in linear overflows beyond the end of the buffer, leading to all kinds of misbehaviors. The safe replacement is strscpy() [1].
This is in preparation of a possible future step where all strcpy() uses will be removed in favour of strscpy() [2].
This fixes CheckPatch warnings:
WARNING: Prefer strscpy over strcpy
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy [1] Link: https://github.com/KSPP/linux/issues/88 [2] Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20240514011335.176158-6-martineau@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
92ef0fd5 |
| 09-May-2024 |
Jens Axboe <axboe@kernel.dk> |
net: change proto and proto_ops accept type
Rather than pass in flags, error pointer, and whether this is a kernel invocation or not, add a struct proto_accept_arg struct as the argument. This then
net: change proto and proto_ops accept type
Rather than pass in flags, error pointer, and whether this is a kernel invocation or not, add a struct proto_accept_arg struct as the argument. This then holds all of these arguments, and prepares accept for being able to pass back more information.
No functional changes in this patch.
Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
fb7a0d33 |
| 29-Apr-2024 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: ensure snd_nxt is properly initialized on connect
Christoph reported a splat hinting at a corrupted snd_una:
WARNING: CPU: 1 PID: 38 at net/mptcp/protocol.c:1005 __mptcp_clean_una+0x4b3/0x
mptcp: ensure snd_nxt is properly initialized on connect
Christoph reported a splat hinting at a corrupted snd_una:
WARNING: CPU: 1 PID: 38 at net/mptcp/protocol.c:1005 __mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005 Modules linked in: CPU: 1 PID: 38 Comm: kworker/1:1 Not tainted 6.9.0-rc1-gbbeac67456c9 #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:__mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005 Code: be 06 01 00 00 bf 06 01 00 00 e8 a8 12 e7 fe e9 00 fe ff ff e8 8e 1a e7 fe 0f b7 ab 3e 02 00 00 e9 d3 fd ff ff e8 7d 1a e7 fe <0f> 0b 4c 8b bb e0 05 00 00 e9 74 fc ff ff e8 6a 1a e7 fe 0f 0b e9 RSP: 0018:ffffc9000013fd48 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8881029bd280 RCX: ffffffff82382fe4 RDX: ffff8881003cbd00 RSI: ffffffff823833c3 RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: fefefefefefefeff R12: ffff888138ba8000 R13: 0000000000000106 R14: ffff8881029bd908 R15: ffff888126560000 FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f604a5dae38 CR3: 0000000101dac002 CR4: 0000000000170ef0 Call Trace: <TASK> __mptcp_clean_una_wakeup net/mptcp/protocol.c:1055 [inline] mptcp_clean_una_wakeup net/mptcp/protocol.c:1062 [inline] __mptcp_retrans+0x7f/0x7e0 net/mptcp/protocol.c:2615 mptcp_worker+0x434/0x740 net/mptcp/protocol.c:2767 process_one_work+0x1e0/0x560 kernel/workqueue.c:3254 process_scheduled_works kernel/workqueue.c:3335 [inline] worker_thread+0x3c7/0x640 kernel/workqueue.c:3416 kthread+0x121/0x170 kernel/kthread.c:388 ret_from_fork+0x44/0x50 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 </TASK>
When fallback to TCP happens early on a client socket, snd_nxt is not yet initialized and any incoming ack will copy such value into snd_una. If the mptcp worker (dumbly) tries mptcp-level re-injection after such ack, that would unconditionally trigger a send buffer cleanup using 'bad' snd_una values.
We could easily disable re-injection for fallback sockets, but such dumb behavior already helped catching a few subtle issues and a very low to zero impact in practice.
Instead address the issue always initializing snd_nxt (and write_seq, for consistency) at connect time.
Fixes: 8fd738049ac3 ("mptcp: fallback in case of simultaneous connect") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/485 Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240429-upstream-net-20240429-mptcp-snd_nxt-init-connect-v1-1-59ceac0a7dcb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
a86a0661 |
| 29-Apr-2024 |
Eric Dumazet <edumazet@google.com> |
net: move sysctl_max_skb_frags to net_hotdata
sysctl_max_skb_frags is used in TCP and MPTCP fast paths, move it to net_hodata for better cache locality.
Signed-off-by: Eric Dumazet <edumazet@google
net: move sysctl_max_skb_frags to net_hotdata
sysctl_max_skb_frags is used in TCP and MPTCP fast paths, move it to net_hodata for better cache locality.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240429134025.1233626-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
215d4024 |
| 25-Apr-2024 |
Jason Xing <kernelxing@tencent.com> |
mptcp: introducing a helper into active reset logic
Since we have mapped every mptcp reset reason definition in enum sk_rst_reason, introducing a new helper can cover some missing places where we ha
mptcp: introducing a helper into active reset logic
Since we have mapped every mptcp reset reason definition in enum sk_rst_reason, introducing a new helper can cover some missing places where we have already set the subflow->reset_reason.
Note: using SK_RST_REASON_NOT_SPECIFIED is the same as SK_RST_REASON_MPTCP_RST_EUNSPEC. They are both unknown. So we can convert it directly.
Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
5691276b |
| 25-Apr-2024 |
Jason Xing <kernelxing@tencent.com> |
rstreason: prepare for active reset
Like what we did to passive reset: only passing possible reset reason in each active reset path.
No functional changes.
Signed-off-by: Jason Xing <kernelxing@te
rstreason: prepare for active reset
Like what we did to passive reset: only passing possible reset reason in each active reset path.
No functional changes.
Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
18d82cde |
| 10-Apr-2024 |
Geliang Tang <tanggeliang@kylinos.cn> |
mptcp: add last time fields in mptcp_info
This patch adds "last time" fields last_data_sent, last_data_recv and last_ack_recv in struct mptcp_sock to record the last time data_sent, data_recv and ac
mptcp: add last time fields in mptcp_info
This patch adds "last time" fields last_data_sent, last_data_recv and last_ack_recv in struct mptcp_sock to record the last time data_sent, data_recv and ack_recv happened. They all are initialized as tcp_jiffies32 in __mptcp_init_sock(), and updated as tcp_jiffies32 too when data is sent in __subflow_push_pending(), data is received in __mptcp_move_skbs_from_subflow(), and ack is received in ack_update_msk().
Similar to tcpi_last_data_sent, tcpi_last_data_recv and tcpi_last_ack_recv exposed with TCP, this patch exposes the last time "an action happened" for MPTCP in mptcp_info, named mptcpi_last_data_sent, mptcpi_last_data_recv and mptcpi_last_ack_recv, calculated in mptcp_diag_fill_info() as the time deltas between now and the newly added last time fields in mptcp_sock.
Since msk->last_ack_recv needs to be protected by mptcp_data_lock/unlock, and lock_sock_fast can sleep and be quite slow, move the entire mptcp_data_lock/unlock block after the lock/unlock_sock_fast block. Then mptcpi_last_data_sent and mptcpi_last_data_recv are set in lock/unlock_sock_fast block, while mptcpi_last_ack_recv is set in mptcp_data_lock/unlock block, which is protected by a spinlock and should not block for too long.
Also add three reserved bytes in struct mptcp_info not to have holes in this structure exposed to userspace.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/446 Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240410-upstream-net-next-20240405-mptcp-last-time-info-v2-1-f95bd6b33e51@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
f410cbea |
| 04-Apr-2024 |
Eric Dumazet <edumazet@google.com> |
tcp: annotate data-races around tp->window_clamp
tp->window_clamp can be read locklessly, add READ_ONCE() and WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by
tcp: annotate data-races around tp->window_clamp
tp->window_clamp can be read locklessly, add READ_ONCE() and WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/20240404114231.2195171-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
7a1b3490 |
| 29-Mar-2024 |
Davide Caratti <dcaratti@redhat.com> |
mptcp: don't account accept() of non-MPC client as fallback to TCP
Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they accept non-MPC connections. As reported by Christoph, this i
mptcp: don't account accept() of non-MPC client as fallback to TCP
Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they accept non-MPC connections. As reported by Christoph, this is "surprising" because the counter might become greater than MPTcpExtMPCapableSYNRX.
MPTcpExtMPCapableFallbackACK counter's name suggests it should only be incremented when a connection was seen using MPTCP options, then a fallback to TCP has been done. Let's do that by incrementing it when the subflow context of an inbound MPC connection attempt is dropped. Also, update mptcp_connect.sh kselftest, to ensure that the above MIB does not increment in case a pure TCP client connects to a MPTCP server.
Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/449 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-1-324a8981da48@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
d5dfbfa2 |
| 05-Mar-2024 |
Geliang Tang <tanggeliang@kylinos.cn> |
mptcp: drop duplicate header inclusions
The headers net/tcp.h, net/genetlink.h and uapi/linux/mptcp.h are included in protocol.h already, no need to include them again directly. This patch removes t
mptcp: drop duplicate header inclusions
The headers net/tcp.h, net/genetlink.h and uapi/linux/mptcp.h are included in protocol.h already, no need to include them again directly. This patch removes these duplicate header inclusions.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240305-upstream-net-next-20240304-mptcp-misc-cleanup-v1-1-c436ba5e569b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
29b5e5ef |
| 01-Mar-2024 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: implement TCP_NOTSENT_LOWAT support
Add support for such socket option storing the user-space provided value in a new msk field, and using such data to implement the _mptcp_stream_memory_free
mptcp: implement TCP_NOTSENT_LOWAT support
Add support for such socket option storing the user-space provided value in a new msk field, and using such data to implement the _mptcp_stream_memory_free() helper, similar to the TCP one.
To avoid adding more indirect calls in the fast path, open-code a variant of sk_stream_memory_free() in mptcp_sendmsg() and add direct calls to the mptcp stream memory free helper where possible.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/464 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
037db6ea |
| 01-Mar-2024 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: cleanup writer wake-up
After commit 5cf92bbadc58 ("mptcp: re-enable sndbuf autotune"), the MPTCP_NOSPACE bit is redundant: it is always set and cleared together with SOCK_NOSPACE.
Let's drop
mptcp: cleanup writer wake-up
After commit 5cf92bbadc58 ("mptcp: re-enable sndbuf autotune"), the MPTCP_NOSPACE bit is redundant: it is always set and cleared together with SOCK_NOSPACE.
Let's drop the first and always relay on the latter, dropping a bunch of useless code.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
10048689 |
| 23-Feb-2024 |
Davide Caratti <dcaratti@redhat.com> |
mptcp: fix double-free on socket dismantle
when MPTCP server accepts an incoming connection, it clones its listener socket. However, the pointer to 'inet_opt' for the new socket has the same value a
mptcp: fix double-free on socket dismantle
when MPTCP server accepts an incoming connection, it clones its listener socket. However, the pointer to 'inet_opt' for the new socket has the same value as the original one: as a consequence, on program exit it's possible to observe the following splat:
BUG: KASAN: double-free in inet_sock_destruct+0x54f/0x8b0 Free of addr ffff888485950880 by task swapper/25/0
CPU: 25 PID: 0 Comm: swapper/25 Kdump: loaded Not tainted 6.8.0-rc1+ #609 Hardware name: Supermicro SYS-6027R-72RF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0 07/26/2013 Call Trace: <IRQ> dump_stack_lvl+0x32/0x50 print_report+0xca/0x620 kasan_report_invalid_free+0x64/0x90 __kasan_slab_free+0x1aa/0x1f0 kfree+0xed/0x2e0 inet_sock_destruct+0x54f/0x8b0 __sk_destruct+0x48/0x5b0 rcu_do_batch+0x34e/0xd90 rcu_core+0x559/0xac0 __do_softirq+0x183/0x5a4 irq_exit_rcu+0x12d/0x170 sysvec_apic_timer_interrupt+0x6b/0x80 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x16/0x20 RIP: 0010:cpuidle_enter_state+0x175/0x300 Code: 30 00 0f 84 1f 01 00 00 83 e8 01 83 f8 ff 75 e5 48 83 c4 18 44 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc fb 45 85 ed <0f> 89 60 ff ff ff 48 c1 e5 06 48 c7 43 18 00 00 00 00 48 83 44 2b RSP: 0018:ffff888481cf7d90 EFLAGS: 00000202 RAX: 0000000000000000 RBX: ffff88887facddc8 RCX: 0000000000000000 RDX: 1ffff1110ff588b1 RSI: 0000000000000019 RDI: ffff88887fac4588 RBP: 0000000000000004 R08: 0000000000000002 R09: 0000000000043080 R10: 0009b02ea273363f R11: ffff88887fabf42b R12: ffffffff932592e0 R13: 0000000000000004 R14: 0000000000000000 R15: 00000022c880ec80 cpuidle_enter+0x4a/0xa0 do_idle+0x310/0x410 cpu_startup_entry+0x51/0x60 start_secondary+0x211/0x270 secondary_startup_64_no_verify+0x184/0x18b </TASK>
Allocated by task 6853: kasan_save_stack+0x1c/0x40 kasan_save_track+0x10/0x30 __kasan_kmalloc+0xa6/0xb0 __kmalloc+0x1eb/0x450 cipso_v4_sock_setattr+0x96/0x360 netlbl_sock_setattr+0x132/0x1f0 selinux_netlbl_socket_post_create+0x6c/0x110 selinux_socket_post_create+0x37b/0x7f0 security_socket_post_create+0x63/0xb0 __sock_create+0x305/0x450 __sys_socket_create.part.23+0xbd/0x130 __sys_socket+0x37/0xb0 __x64_sys_socket+0x6f/0xb0 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x6e/0x76
Freed by task 6858: kasan_save_stack+0x1c/0x40 kasan_save_track+0x10/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x12c/0x1f0 kfree+0xed/0x2e0 inet_sock_destruct+0x54f/0x8b0 __sk_destruct+0x48/0x5b0 subflow_ulp_release+0x1f0/0x250 tcp_cleanup_ulp+0x6e/0x110 tcp_v4_destroy_sock+0x5a/0x3a0 inet_csk_destroy_sock+0x135/0x390 tcp_fin+0x416/0x5c0 tcp_data_queue+0x1bc8/0x4310 tcp_rcv_state_process+0x15a3/0x47b0 tcp_v4_do_rcv+0x2c1/0x990 tcp_v4_rcv+0x41fb/0x5ed0 ip_protocol_deliver_rcu+0x6d/0x9f0 ip_local_deliver_finish+0x278/0x360 ip_local_deliver+0x182/0x2c0 ip_rcv+0xb5/0x1c0 __netif_receive_skb_one_core+0x16e/0x1b0 process_backlog+0x1e3/0x650 __napi_poll+0xa6/0x500 net_rx_action+0x740/0xbb0 __do_softirq+0x183/0x5a4
The buggy address belongs to the object at ffff888485950880 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 0 bytes inside of 64-byte region [ffff888485950880, ffff8884859508c0)
The buggy address belongs to the physical page: page:0000000056d1e95e refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888485950700 pfn:0x485950 flags: 0x57ffffc0000800(slab|node=1|zone=2|lastcpupid=0x1fffff) page_type: 0xffffffff() raw: 0057ffffc0000800 ffff88810004c640 ffffea00121b8ac0 dead000000000006 raw: ffff888485950700 0000000000200019 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff888485950780: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888485950800: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff888485950880: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^ ffff888485950900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888485950980: 00 00 00 00 00 01 fc fc fc fc fc fc fc fc fc fc
Something similar (a refcount underflow) happens with CALIPSO/IPv6. Fix this by duplicating IP / IPv6 options after clone, so that ip{,6}_sock_destruct() doesn't end up freeing the same memory area twice.
Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming connections") Cc: stable@vger.kernel.org Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-8-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
adf1bb78 |
| 23-Feb-2024 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: fix snd_wnd initialization for passive socket
Such value should be inherited from the first subflow, but passive sockets always used 'rsk_rcv_wnd'.
Fixes: 6f8a612a33e4 ("mptcp: keep track of
mptcp: fix snd_wnd initialization for passive socket
Such value should be inherited from the first subflow, but passive sockets always used 'rsk_rcv_wnd'.
Fixes: 6f8a612a33e4 ("mptcp: keep track of advertised windows right edge") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-5-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
b9cd26f6 |
| 23-Feb-2024 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: push at DSS boundaries
when inserting not contiguous data in the subflow write queue, the protocol creates a new skb and prevent the TCP stack from merging it later with already queued skbs b
mptcp: push at DSS boundaries
when inserting not contiguous data in the subflow write queue, the protocol creates a new skb and prevent the TCP stack from merging it later with already queued skbs by setting the EOR marker.
Still no push flag is explicitly set at the end of previous GSO packet, making the aggregation on the receiver side sub-optimal - and packetdrill self-tests less predictable.
Explicitly mark the end of not contiguous DSS with the push flag.
Fixes: 6d0060f600ad ("mptcp: Write MPTCP DSS headers to outgoing data packets") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-4-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
a7cfe776 |
| 15-Feb-2024 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: fix data races on local_id
The local address id is accessed lockless by the NL PM, add all the required ONCE annotation. There is a caveat: the local id can be initialized late in the subflow
mptcp: fix data races on local_id
The local address id is accessed lockless by the NL PM, add all the required ONCE annotation. There is a caveat: the local id can be initialized late in the subflow life-cycle, and its validity is controlled by the local_id_valid flag.
Remove such flag and encode the validity in the local_id field itself with negative value before initialization. That allows accessing the field consistently with a single read operation.
Fixes: 0ee4261a3681 ("mptcp: implement mptcp_pm_remove_subflow") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
e4a0fa47 |
| 08-Feb-2024 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: corner case locking for rx path fields initialization
Most MPTCP-level related fields are under the mptcp data lock protection, but are written one-off without such lock at MPC complete time,
mptcp: corner case locking for rx path fields initialization
Most MPTCP-level related fields are under the mptcp data lock protection, but are written one-off without such lock at MPC complete time, both for the client and the server
Leverage the mptcp_propagate_state() infrastructure to move such initialization under the proper lock client-wise.
The server side critical init steps are done by mptcp_subflow_fully_established(): ensure the caller properly held the relevant lock, and avoid acquiring the same lock in the nested scopes.
There are no real potential races, as write access to such fields is implicitly serialized by the MPTCP state machine; the primary goal is consistency.
Fixes: d22f4988ffec ("mptcp: process MP_CAPABLE data option") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
3f83d8a7 |
| 08-Feb-2024 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: fix more tx path fields initialization
The 'msk->write_seq' and 'msk->snd_nxt' are always updated under the msk socket lock, except at MPC handshake completiont time.
Builds-up on the previo
mptcp: fix more tx path fields initialization
The 'msk->write_seq' and 'msk->snd_nxt' are always updated under the msk socket lock, except at MPC handshake completiont time.
Builds-up on the previous commit to move such init under the relevant lock.
There are no known problems caused by the potential race, the primary goal is consistency.
Fixes: 6d0060f600ad ("mptcp: Write MPTCP DSS headers to outgoing data packets") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
013e3179 |
| 08-Feb-2024 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: fix rcv space initialization
mptcp_rcv_space_init() is supposed to happen under the msk socket lock, but active msk socket does that without such protection.
Leverage the existing mptcp_prop
mptcp: fix rcv space initialization
mptcp_rcv_space_init() is supposed to happen under the msk socket lock, but active msk socket does that without such protection.
Leverage the existing mptcp_propagate_state() helper to that extent. We need to ensure mptcp_rcv_space_init will happen before mptcp_rcv_space_adjust(), and the release_cb does not assure that: explicitly check for such condition.
While at it, move the wnd_end initialization out of mptcp_rcv_space_init(), it never belonged there.
Note that the race does not produce ill effect in practice, but change allows cleaning-up and defying better the locking model.
Fixes: a6b118febbab ("mptcp: add receive buffer auto-tuning") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
bdd70eb6 |
| 08-Feb-2024 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: drop the push_pending field
Such field is there to avoid acquiring the data lock in a few spots, but it adds complexity to the already non trivial locking schema.
All the relevant call sites
mptcp: drop the push_pending field
Such field is there to avoid acquiring the data lock in a few spots, but it adds complexity to the already non trivial locking schema.
All the relevant call sites (mptcp-level re-injection, set socket options), are slow-path, drop such field in favor of 'cb_flags', adding the relevant locking.
This patch could be seen as an improvement, instead of a fix. But it simplifies the next patch. The 'Fixes' tag has been added to help having this series backported to stable.
Fixes: e9d09baca676 ("mptcp: avoid atomic bit manipulation when possible") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
28e5c138 |
| 02-Feb-2024 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: annotate lockless accesses around read-mostly fields
The following MPTCP socket fields:
- can_ack - fully_established - rcv_data_fin - snd_data_fin_enable - rcv_fastclose - use_64bit_a
mptcp: annotate lockless accesses around read-mostly fields
The following MPTCP socket fields:
- can_ack - fully_established - rcv_data_fin - snd_data_fin_enable - rcv_fastclose - use_64bit_ack
are accessed without any lock, add the appropriate annotation.
The schema is safe as each field can change its value at most once in the whole mptcp socket life cycle.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
b9f45543 |
| 02-Feb-2024 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: annotate lockless access for token
The token field is manipulated under the msk socket lock and accessed lockless in a few spots, add proper ONCE annotation
Signed-off-by: Paolo Abeni <paben
mptcp: annotate lockless access for token
The token field is manipulated under the msk socket lock and accessed lockless in a few spots, add proper ONCE annotation
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
9426ce47 |
| 02-Feb-2024 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: annotate lockless access for RX path fields
The following fields:
- ack_seq - snd_una - wnd_end - rmem_fwd_alloc
are protected by the data lock end accessed lockless in a few spots. Ens
mptcp: annotate lockless access for RX path fields
The following fields:
- ack_seq - snd_una - wnd_end - rmem_fwd_alloc
are protected by the data lock end accessed lockless in a few spots. Ensure ONCE annotation for write (under such lock) and for lockless read.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
d440a4e2 |
| 02-Feb-2024 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: annotate lockless access for the tx path
The mptcp-level TX path info (write_seq, bytes_sent, snd_nxt) are under the msk socket lock protection, and are accessed lockless in a few spots.
Alw
mptcp: annotate lockless access for the tx path
The mptcp-level TX path info (write_seq, bytes_sent, snd_nxt) are under the msk socket lock protection, and are accessed lockless in a few spots.
Always mark the write operations with WRITE_ONCE, read operations outside the lock with READ_ONCE and drop the annotation for read under such lock.
To simplify the annotations move mptcp_pending_data_fin_ack() from __mptcp_data_acked() to __mptcp_clean_una(), under the msk socket lock, where such call would belong.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|