#
3ef3905a |
| 17-Mar-2022 |
Yonglong Li <liyonglong@chinatelecom.cn> |
mptcp: Fix crash due to tcp_tsorted_anchor was initialized before release skb
Got crash when doing pressure test of mptcp:
==========================================================================
mptcp: Fix crash due to tcp_tsorted_anchor was initialized before release skb
Got crash when doing pressure test of mptcp:
=========================================================================== dst_release: dst:ffffa06ce6e5c058 refcnt:-1 kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle kernel paging request at ffffa06ce6e5c058 PGD 190a01067 P4D 190a01067 PUD 43fffb067 PMD 22e403063 PTE 8000000226e5c063 Oops: 0011 [#1] SMP PTI CPU: 7 PID: 7823 Comm: kworker/7:0 Kdump: loaded Tainted: G E Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.2.1 04/01/2014 Call Trace: ? skb_release_head_state+0x68/0x100 ? skb_release_all+0xe/0x30 ? kfree_skb+0x32/0xa0 ? mptcp_sendmsg_frag+0x57e/0x750 ? __mptcp_retrans+0x21b/0x3c0 ? __switch_to_asm+0x35/0x70 ? mptcp_worker+0x25e/0x320 ? process_one_work+0x1a7/0x360 ? worker_thread+0x30/0x390 ? create_worker+0x1a0/0x1a0 ? kthread+0x112/0x130 ? kthread_flush_work_fn+0x10/0x10 ? ret_from_fork+0x35/0x40 ===========================================================================
In __mptcp_alloc_tx_skb skb was allocated and skb->tcp_tsorted_anchor will be initialized, in under memory pressure situation sk_wmem_schedule will return false and then kfree_skb. In this case skb->_skb_refdst is not null because_skb_refdst and tcp_tsorted_anchor are stored in the same mem, and kfree_skb will try to release dst and cause crash.
Fixes: f70cad1085d1 ("mptcp: stop relying on tcp_tx_skb_cache") Reviewed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Link: https://lore.kernel.org/r/20220317220953.426024-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
4cf86ae8 |
| 07-Mar-2022 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: strict local address ID selection
The address ID selection for MPJ subflows created in response to incoming ADD_ADDR option is currently unreliable: it happens at MPJ socket creation time, wh
mptcp: strict local address ID selection
The address ID selection for MPJ subflows created in response to incoming ADD_ADDR option is currently unreliable: it happens at MPJ socket creation time, when the local address could be unknown.
Additionally, if the no local endpoint is available for the local address, a new dummy endpoint is created, confusing the user-land.
This change refactor the code to move the address ID selection inside the rebuild_header() helper, when the local address eventually selected by the route lookup is finally known. If the address used is not mapped by any endpoint - and thus can't be advertised/removed pick the id 0 instead of allocate a new endpoint.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
0eb4e7ee |
| 07-Mar-2022 |
Geliang Tang <geliang.tang@suse.com> |
mptcp: add tracepoint in mptcp_sendmsg_frag
The tracepoint in get_mapping_status() only dumped the incoming mpext fields. This patch added a new tracepoint in mptcp_sendmsg_frag() to dump the outgoi
mptcp: add tracepoint in mptcp_sendmsg_frag
The tracepoint in get_mapping_status() only dumped the incoming mpext fields. This patch added a new tracepoint in mptcp_sendmsg_frag() to dump the outgoing mpext too.
Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
877d11f0 |
| 25-Feb-2022 |
Mat Martineau <mathew.j.martineau@linux.intel.com> |
mptcp: Correctly set DATA_FIN timeout when number of retransmits is large
Syzkaller with UBSAN uncovered a scenario where a large number of DATA_FIN retransmits caused a shift-out-of-bounds in the D
mptcp: Correctly set DATA_FIN timeout when number of retransmits is large
Syzkaller with UBSAN uncovered a scenario where a large number of DATA_FIN retransmits caused a shift-out-of-bounds in the DATA_FIN timeout calculation:
================================================================================ UBSAN: shift-out-of-bounds in net/mptcp/protocol.c:470:29 shift exponent 32 is too large for 32-bit type 'unsigned int' CPU: 1 PID: 13059 Comm: kworker/1:0 Not tainted 5.17.0-rc2-00630-g5fbf21c90c60 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Workqueue: events mptcp_worker Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 ubsan_epilogue+0xb/0x5a lib/ubsan.c:151 __ubsan_handle_shift_out_of_bounds.cold+0xb2/0x20e lib/ubsan.c:330 mptcp_set_datafin_timeout net/mptcp/protocol.c:470 [inline] __mptcp_retrans.cold+0x72/0x77 net/mptcp/protocol.c:2445 mptcp_worker+0x58a/0xa70 net/mptcp/protocol.c:2528 process_one_work+0x9df/0x16d0 kernel/workqueue.c:2307 worker_thread+0x95/0xe10 kernel/workqueue.c:2454 kthread+0x2f4/0x3b0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> ================================================================================
This change limits the maximum timeout by limiting the size of the shift, which keeps all intermediate values in-bounds.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/259 Fixes: 6477dd39e62c ("mptcp: Retransmit DATA_FIN") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
07c2c7a3 |
| 25-Feb-2022 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: accurate SIOCOUTQ for fallback socket
The MPTCP SIOCOUTQ implementation is not very accurate in case of fallback: it only measures the data in the MPTCP-level write queue, but it does not tak
mptcp: accurate SIOCOUTQ for fallback socket
The MPTCP SIOCOUTQ implementation is not very accurate in case of fallback: it only measures the data in the MPTCP-level write queue, but it does not take in account the subflow write queue utilization. In case of fallback the first can be empty, while the latter is not.
The above produces sporadic self-tests issues and can foul legit user-space application.
Fix the issue additionally querying the subflow in case of fallback.
Fixes: 644807e3e462 ("mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/260 Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
269bda9e |
| 06-Jan-2022 |
Mat Martineau <mathew.j.martineau@linux.intel.com> |
mptcp: Check reclaim amount before reducing allocation
syzbot found a page counter underflow that was triggered by MPTCP's reclaim code:
page_counter underflow: -4294964789 nr_pages=4294967295 WARN
mptcp: Check reclaim amount before reducing allocation
syzbot found a page counter underflow that was triggered by MPTCP's reclaim code:
page_counter underflow: -4294964789 nr_pages=4294967295 WARNING: CPU: 2 PID: 3785 at mm/page_counter.c:56 page_counter_cancel+0xcf/0xe0 mm/page_counter.c:56 Modules linked in: CPU: 2 PID: 3785 Comm: kworker/2:6 Not tainted 5.16.0-rc1-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 Workqueue: events mptcp_worker
RIP: 0010:page_counter_cancel+0xcf/0xe0 mm/page_counter.c:56 Code: c7 04 24 00 00 00 00 45 31 f6 eb 97 e8 2a 2b b5 ff 4c 89 ea 48 89 ee 48 c7 c7 00 9e b8 89 c6 05 a0 c1 ba 0b 01 e8 95 e4 4b 07 <0f> 0b eb a8 4c 89 e7 e8 25 5a fb ff eb c7 0f 1f 00 41 56 41 55 49 RSP: 0018:ffffc90002d4f918 EFLAGS: 00010082
RAX: 0000000000000000 RBX: ffff88806a494120 RCX: 0000000000000000 RDX: ffff8880688c41c0 RSI: ffffffff815e8f28 RDI: fffff520005a9f15 RBP: ffffffff000009cb R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815e2cfe R11: 0000000000000000 R12: ffff88806a494120 R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff88802cc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2de21000 CR3: 000000005ad59000 CR4: 0000000000150ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> page_counter_uncharge+0x2e/0x60 mm/page_counter.c:160 drain_stock+0xc1/0x180 mm/memcontrol.c:2219 refill_stock+0x139/0x2f0 mm/memcontrol.c:2271 __sk_mem_reduce_allocated+0x24d/0x550 net/core/sock.c:2945 __mptcp_rmem_reclaim net/mptcp/protocol.c:167 [inline] __mptcp_mem_reclaim_partial+0x124/0x410 net/mptcp/protocol.c:975 mptcp_mem_reclaim_partial net/mptcp/protocol.c:982 [inline] mptcp_alloc_tx_skb net/mptcp/protocol.c:1212 [inline] mptcp_sendmsg_frag+0x18c6/0x2190 net/mptcp/protocol.c:1279 __mptcp_push_pending+0x232/0x720 net/mptcp/protocol.c:1545 mptcp_release_cb+0xfe/0x200 net/mptcp/protocol.c:2975 release_sock+0xb4/0x1b0 net/core/sock.c:3306 mptcp_worker+0x51e/0xc10 net/mptcp/protocol.c:2443 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298 worker_thread+0x658/0x11f0 kernel/workqueue.c:2445 kthread+0x405/0x4f0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK>
__mptcp_mem_reclaim_partial() could call __mptcp_rmem_reclaim() with a negative value, which passed that negative value to __sk_mem_reduce_allocated() and triggered the splat above.
Check for a reclaim amount that is positive and large enough for __mptcp_rmem_reclaim() to actually adjust rmem_fwd_alloc (much like the sk_mem_reclaim_partial() code the function is based on).
v2: Use '>' instead of '>=', since SK_MEM_QUANTUM - 1 would get right-shifted into nothing by __mptcp_rmem_reclaim.
Fixes: 6511882cdd82 ("mptcp: allocate fwd memory separately on the rx and tx path") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/252 Reported-and-tested-by: syzbot+bc9e2d2dbcb347dd215a@syzkaller.appspotmail.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
e9d09bac |
| 07-Jan-2022 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: avoid atomic bit manipulation when possible
Currently the msk->flags bitmask carries both state for the mptcp_release_cb() - mostly touched under the mptcp data lock - and others state info t
mptcp: avoid atomic bit manipulation when possible
Currently the msk->flags bitmask carries both state for the mptcp_release_cb() - mostly touched under the mptcp data lock - and others state info touched even outside such lock scope.
As a consequence, msk->flags is always manipulated with atomic operations.
This change splits such bitmask in two separate fields, so that we use plain bit operations when touching the cb-related info.
The MPTCP_PUSH_PENDING bit needs additional care, as it is the only CB related field currently accessed either under the mptcp data lock or the mptcp socket lock. Let's add another mask just for such bit's sake.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
3e501490 |
| 07-Jan-2022 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: cleanup MPJ subflow list handling
We can simplify the join list handling leveraging the mptcp_release_cb(): if we can acquire the msk socket lock at mptcp_finish_join time, move the new subfl
mptcp: cleanup MPJ subflow list handling
We can simplify the join list handling leveraging the mptcp_release_cb(): if we can acquire the msk socket lock at mptcp_finish_join time, move the new subflow directly into the conn_list, otherwise place it on join_list and let the release_cb process such list.
Since pending MPJ connection are now always processed in a timely way, we can avoid flushing the join list every time we have to process all the current subflows.
Additionally we can now use the mptcp data lock to protect the join_list, removing the additional spin lock.
Finally, the MPJ handshake is now always finalized under the msk socket lock, we can drop the additional synchronization between mptcp_finish_join() and mptcp_close().
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
a88c9e49 |
| 07-Jan-2022 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: do not block subflows creation on errors
If the MPTCP configuration allows for multiple subflows creation, and the first additional subflows never reach the fully established status - e.g. du
mptcp: do not block subflows creation on errors
If the MPTCP configuration allows for multiple subflows creation, and the first additional subflows never reach the fully established status - e.g. due to packets drop or reset - the in kernel path manager do not move to the next subflow.
This patch introduces a new PM helper to cope with MPJ subflow creation failure and delay and hook it where appropriate.
Such helper triggers additional subflow creation, as needed and updates the PM subflow counter, if the current one is closing.
Additionally start all the needed additional subflows as soon as the MPTCP socket is fully established, so we don't have to cope with slow MPJ handshake blocking the next subflow creation.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
86e39e04 |
| 07-Jan-2022 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: keep track of local endpoint still available for each msk
Include into the path manager status a bitmap tracking the list of local endpoints still available - not yet used - for the relevant
mptcp: keep track of local endpoint still available for each msk
Include into the path manager status a bitmap tracking the list of local endpoints still available - not yet used - for the relevant mptcp socket.
Keep such map updated at endpoint creation/deletion time, so that we can easily skip already used endpoint at local address selection time.
The endpoint used by the initial subflow is lazyly accounted at subflow creation time: the usage bitmap is be up2date before endpoint selection and we avoid such unneeded task in some relevant scenarios - e.g. busy servers accepting incoming subflows but not creating any additional ones nor annuncing additional addresses.
Overall this allows for fair local endpoints usage in case of subflow failure.
As a side effect, this patch also enforces that each endpoint is used at most once for each mptcp connection.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
3d1d6d66 |
| 07-Jan-2022 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: implement support for user-space disconnect
Handle explicitly AF_UNSPEC in mptcp_stream_connnect() to allow user-space to disconnect established MPTCP connections
Signed-off-by: Paolo Abeni
mptcp: implement support for user-space disconnect
Handle explicitly AF_UNSPEC in mptcp_stream_connnect() to allow user-space to disconnect established MPTCP connections
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
71ba088c |
| 07-Jan-2022 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: cleanup accept and poll
After the previous patch, msk->subflow will never be deleted during the whole msk lifetime. We don't need anymore to acquire references to it in mptcp_stream_accept()
mptcp: cleanup accept and poll
After the previous patch, msk->subflow will never be deleted during the whole msk lifetime. We don't need anymore to acquire references to it in mptcp_stream_accept() and we can use the listener subflow accept queue to simplify mptcp_poll() for listener socket.
Overall this removes a lock pair and 4 more atomic operations per accept().
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
b29fcfb5 |
| 07-Jan-2022 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: full disconnect implementation
The current mptcp_disconnect() implementation lacks several steps, we additionally need to reset the msk socket state and flush the subflow list.
Factor out th
mptcp: full disconnect implementation
The current mptcp_disconnect() implementation lacks several steps, we additionally need to reset the msk socket state and flush the subflow list.
Factor out the needed helper to avoid code duplication.
Additionally ensure that the initial subflow is disposed only after mptcp_close(), just reset it at disconnect time.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
58cd405b |
| 07-Jan-2022 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: keep snd_una updated for fallback socket
After shutdown, for fallback MPTCP sockets, we always have
write_seq == snd_una+1
The above will foul OUTQ ioctl(). Keep snd_una in sync with write_
mptcp: keep snd_una updated for fallback socket
After shutdown, for fallback MPTCP sockets, we always have
write_seq == snd_una+1
The above will foul OUTQ ioctl(). Keep snd_una in sync with write_seq even after shutdown.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
3ce0852c |
| 17-Dec-2021 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: enforce HoL-blocking estimation
The MPTCP packet scheduler has sub-optimal behavior with asymmetric subflows: if the faster subflow-level cwin is closed, the packet scheduler can enqueue "too
mptcp: enforce HoL-blocking estimation
The MPTCP packet scheduler has sub-optimal behavior with asymmetric subflows: if the faster subflow-level cwin is closed, the packet scheduler can enqueue "too much" data on a slower subflow.
When all the data on the faster subflow is acked, if the mptcp-level cwin is closed, and link utilization becomes suboptimal.
The solution is implementing blest-like[1] HoL-blocking estimation, transmitting only on the subflow with the shorter estimated time to flush the queued memory. If such subflows cwin is closed, we wait even if other subflows are available.
This is quite simpler than the original blest implementation, as we leverage the pacing rate provided by the TCP socket. To get a more accurate estimation for the subflow linger-time, we maintain a per-subflow weighted average of such info.
Additionally drop magic numbers usage in favor of newly defined macros and use more meaningful names for status variable.
[1] http://dl.ifip.org/db/conf/networking/networking2016/1570234725.pdf
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/137 Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
3d79e375 |
| 14-Dec-2021 |
Maxim Galaganov <max@internet.ru> |
mptcp: fix deadlock in __mptcp_push_pending()
__mptcp_push_pending() may call mptcp_flush_join_list() with subflow socket lock held. If such call hits mptcp_sockopt_sync_all() then subsequently __mp
mptcp: fix deadlock in __mptcp_push_pending()
__mptcp_push_pending() may call mptcp_flush_join_list() with subflow socket lock held. If such call hits mptcp_sockopt_sync_all() then subsequently __mptcp_sockopt_sync() could try to lock the subflow socket for itself, causing a deadlock.
sysrq: Show Blocked State task:ss-server state:D stack: 0 pid: 938 ppid: 1 flags:0x00000000 Call Trace: <TASK> __schedule+0x2d6/0x10c0 ? __mod_memcg_state+0x4d/0x70 ? csum_partial+0xd/0x20 ? _raw_spin_lock_irqsave+0x26/0x50 schedule+0x4e/0xc0 __lock_sock+0x69/0x90 ? do_wait_intr_irq+0xa0/0xa0 __lock_sock_fast+0x35/0x50 mptcp_sockopt_sync_all+0x38/0xc0 __mptcp_push_pending+0x105/0x200 mptcp_sendmsg+0x466/0x490 sock_sendmsg+0x57/0x60 __sys_sendto+0xf0/0x160 ? do_wait_intr_irq+0xa0/0xa0 ? fpregs_restore_userregs+0x12/0xd0 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f9ba546c2d0 RSP: 002b:00007ffdc3b762d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f9ba56c8060 RCX: 00007f9ba546c2d0 RDX: 000000000000077a RSI: 0000000000e5e180 RDI: 0000000000000234 RBP: 0000000000cc57f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ba56c8060 R13: 0000000000b6ba60 R14: 0000000000cc7840 R15: 41d8685b1d7901b8 </TASK>
Fix the issue by using __mptcp_flush_join_list() instead of plain mptcp_flush_join_list() inside __mptcp_push_pending(), as suggested by Florian. The sockopt sync will be deferred to the workqueue.
Fixes: 1b3e7ede1365 ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/244 Suggested-by: Florian Westphal <fw@strlen.de> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Maxim Galaganov <max@internet.ru> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
d6692b3b |
| 14-Dec-2021 |
Florian Westphal <fw@strlen.de> |
mptcp: clear 'kern' flag from fallback sockets
The mptcp ULP extension relies on sk->sk_sock_kern being set correctly: It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from working for
mptcp: clear 'kern' flag from fallback sockets
The mptcp ULP extension relies on sk->sk_sock_kern being set correctly: It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from working for plain tcp sockets (any userspace-exposed socket).
But in case of fallback, accept() can return a plain tcp sk. In such case, sk is still tagged as 'kernel' and setsockopt will work.
This will crash the kernel, The subflow extension has a NULL ctx->conn mptcp socket:
BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0 Call Trace: tcp_data_ready+0xf8/0x370 [..]
Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming connections") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
8b38217a |
| 03-Dec-2021 |
Maxim Galaganov <max@internet.ru> |
mptcp: expose mptcp_check_and_set_pending
Expose the mptcp_check_and_set_pending() function for use inside MPTCP sockopt code. The next patch will call it when TCP_CORK is cleared or TCP_NODELAY is
mptcp: expose mptcp_check_and_set_pending
Expose the mptcp_check_and_set_pending() function for use inside MPTCP sockopt code. The next patch will call it when TCP_CORK is cleared or TCP_NODELAY is set on the MPTCP socket in order to push pending data from mptcp_release_cb().
Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Maxim Galaganov <max@internet.ru> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
644807e3 |
| 03-Dec-2021 |
Florian Westphal <fw@strlen.de> |
mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls
Allows to query in-sequence data ready for read(), total bytes in write queue and total bytes in write queue that have not yet been sent.
v2: remove unne
mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls
Allows to query in-sequence data ready for read(), total bytes in write queue and total bytes in write queue that have not yet been sent.
v2: remove unneeded READ_ONCE() (Paolo Abeni) v3: check for new data unconditionally in SIOCINQ ioctl (Mat Martineau)
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
2c9e7765 |
| 03-Dec-2021 |
Florian Westphal <fw@strlen.de> |
mptcp: add TCP_INQ cmsg support
Support the TCP_INQ setsockopt.
This is a boolean that tells recvmsg path to include the remaining in-sequence bytes in the cmsg data.
v2: do not use CB(skb)->offse
mptcp: add TCP_INQ cmsg support
Support the TCP_INQ setsockopt.
This is a boolean that tells recvmsg path to include the remaining in-sequence bytes in the cmsg data.
v2: do not use CB(skb)->offset, increment map_seq instead (Paolo Abeni) v3: adjust CB(skb)->map_seq when taking skb from ofo queue (Paolo Abeni)
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/224 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
bcd97734 |
| 19-Nov-2021 |
Paolo Abeni <pabeni@redhat.com> |
mptcp: use delegate action to schedule 3rd ack retrans
Scheduling a delack in mptcp_established_options_mp() is not a good idea: such function is called by tcp_send_ack() and the pending delayed ack
mptcp: use delegate action to schedule 3rd ack retrans
Scheduling a delack in mptcp_established_options_mp() is not a good idea: such function is called by tcp_send_ack() and the pending delayed ack will be cleared shortly after by the tcp_event_ack_sent() call in __tcp_transmit_skb().
Instead use the mptcp delegated action infrastructure to schedule the delayed ack after the current bh processing completes.
Additionally moves the schedule_3rdack_retransmission() helper into protocol.c to avoid making it visible in a different compilation unit.
Fixes: ec3edaa7ca6ce02f ("mptcp: Add handling of outgoing MP_JOIN requests") Reviewed-by: Mat Martineau <mathew.j.martineau>@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
91b6d325 |
| 15-Nov-2021 |
Eric Dumazet <edumazet@google.com> |
net: cache align tcp_memory_allocated, tcp_sockets_allocated
tcp_memory_allocated and tcp_sockets_allocated often share a common cache line, source of false sharing.
Also take care of udp_memory_al
net: cache align tcp_memory_allocated, tcp_sockets_allocated
tcp_memory_allocated and tcp_sockets_allocated often share a common cache line, source of false sharing.
Also take care of udp_memory_allocated and mptcp_sockets_allocated.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
a52fe46e |
| 27-Oct-2021 |
Eric Dumazet <edumazet@google.com> |
tcp: factorize ip_summed setting
Setting skb->ip_summed to CHECKSUM_PARTIAL can be centralized in tcp_stream_alloc_skb() and __mptcp_do_alloc_tx_skb() instead of being done multiple times.
Signed-o
tcp: factorize ip_summed setting
Setting skb->ip_summed to CHECKSUM_PARTIAL can be centralized in tcp_stream_alloc_skb() and __mptcp_do_alloc_tx_skb() instead of being done multiple times.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
f401da47 |
| 27-Oct-2021 |
Eric Dumazet <edumazet@google.com> |
tcp: no longer set skb->reserved_tailroom
TCP/MPTCP sendmsg() no longer puts payload in skb->head, we can remove not needed code.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Da
tcp: no longer set skb->reserved_tailroom
TCP/MPTCP sendmsg() no longer puts payload in skb->head, we can remove not needed code.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
27728ba8 |
| 27-Oct-2021 |
Eric Dumazet <edumazet@google.com> |
tcp: cleanup tcp_remove_empty_skb() use
All tcp_remove_empty_skb() callers now use tcp_write_queue_tail() for the skb argument, we can therefore factorize code.
Signed-off-by: Eric Dumazet <edumaze
tcp: cleanup tcp_remove_empty_skb() use
All tcp_remove_empty_skb() callers now use tcp_write_queue_tail() for the skb argument, we can therefore factorize code.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|