#
581073f6 |
| 15-May-2024 |
Eric Dumazet <edumazet@google.com> |
af_packet: do not call packet_read_pending() from tpacket_destruct_skb()
trafgen performance considerably sank on hosts with many cores after the blamed commit.
packet_read_pending() is very expens
af_packet: do not call packet_read_pending() from tpacket_destruct_skb()
trafgen performance considerably sank on hosts with many cores after the blamed commit.
packet_read_pending() is very expensive, and calling it in af_packet fast path defeats Daniel intent in commit b013840810c2 ("packet: use percpu mmap tx frame pending refcount")
tpacket_destruct_skb() makes room for one packet, we can immediately wakeup a producer, no need to completely drain the tx ring.
Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240515163358.4105915-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
86d43e2b |
| 05-Apr-2024 |
Eric Dumazet <edumazet@google.com> |
af_packet: avoid a false positive warning in packet_setsockopt()
Although the code is correct, the following line
copy_from_sockptr(&req_u.req, optval, len));
triggers this warning :
memcpy: det
af_packet: avoid a false positive warning in packet_setsockopt()
Although the code is correct, the following line
copy_from_sockptr(&req_u.req, optval, len));
triggers this warning :
memcpy: detected field-spanning write (size 28) of single field "dst" at include/linux/sockptr.h:49 (size 16)
Refactor the code to be more explicit.
Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
35c3e279 |
| 14-Mar-2024 |
Abhishek Chauhan <quic_abchauha@quicinc.com> |
Revert "net: Re-use and set mono_delivery_time bit for userspace tstamp packets"
This reverts commit 885c36e59f46375c138de18ff1692f18eff67b7f.
The patch currently broke the bpf selftest test_tc_dti
Revert "net: Re-use and set mono_delivery_time bit for userspace tstamp packets"
This reverts commit 885c36e59f46375c138de18ff1692f18eff67b7f.
The patch currently broke the bpf selftest test_tc_dtime because uapi field __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not necessarily mean mono with the original fix as the bit was re-used for userspace timestamp as well to avoid tstamp reset in the forwarding path. To solve this we need to keep mono_delivery_time as is and introduce another bit called user_delivery_time and fall back to the initial proposal of setting the user_delivery_time bit based on sk_clockid set from userspace.
Fixes: 885c36e59f46 ("net: Re-use and set mono_delivery_time bit for userspace tstamp packets") Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/ Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
6ebfad33 |
| 14-Mar-2024 |
Eric Dumazet <edumazet@google.com> |
packet: annotate data-races around ignore_outgoing
ignore_outgoing is read locklessly from dev_queue_xmit_nit() and packet_getsockopt()
Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
syzbot
packet: annotate data-races around ignore_outgoing
ignore_outgoing is read locklessly from dev_queue_xmit_nit() and packet_getsockopt()
Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
syzbot reported:
BUG: KCSAN: data-race in dev_queue_xmit_nit / packet_setsockopt
write to 0xffff888107804542 of 1 bytes by task 22618 on cpu 0: packet_setsockopt+0xd83/0xfd0 net/packet/af_packet.c:4003 do_sock_setsockopt net/socket.c:2311 [inline] __sys_setsockopt+0x1d8/0x250 net/socket.c:2334 __do_sys_setsockopt net/socket.c:2343 [inline] __se_sys_setsockopt net/socket.c:2340 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2340 do_syscall_64+0xd3/0x1d0 entry_SYSCALL_64_after_hwframe+0x6d/0x75
read to 0xffff888107804542 of 1 bytes by task 27 on cpu 1: dev_queue_xmit_nit+0x82/0x620 net/core/dev.c:2248 xmit_one net/core/dev.c:3527 [inline] dev_hard_start_xmit+0xcc/0x3f0 net/core/dev.c:3547 __dev_queue_xmit+0xf24/0x1dd0 net/core/dev.c:4335 dev_queue_xmit include/linux/netdevice.h:3091 [inline] batadv_send_skb_packet+0x264/0x300 net/batman-adv/send.c:108 batadv_send_broadcast_skb+0x24/0x30 net/batman-adv/send.c:127 batadv_iv_ogm_send_to_if net/batman-adv/bat_iv_ogm.c:392 [inline] batadv_iv_ogm_emit net/batman-adv/bat_iv_ogm.c:420 [inline] batadv_iv_send_outstanding_bat_ogm_packet+0x3f0/0x4b0 net/batman-adv/bat_iv_ogm.c:1700 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0x465/0x990 kernel/workqueue.c:3335 worker_thread+0x526/0x730 kernel/workqueue.c:3416 kthread+0x1d1/0x210 kernel/kthread.c:388 ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
value changed: 0x00 -> 0x01
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 27 Comm: kworker/u8:1 Tainted: G W 6.8.0-syzkaller-08073-g480e035fc4c7 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024 Workqueue: bat_events batadv_iv_send_outstanding_bat_ogm_packet
Fixes: fa788d986a3a ("packet: add sockopt to ignore outgoing packets") Reported-by: syzbot+c669c1136495a2e7c31f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/CANn89i+Z7MfbkBLOv=p7KZ7=K1rKHO4P1OL5LYDCtBiyqsa9oQ@mail.gmail.com/T/#t Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
76839e2f |
| 08-Mar-2024 |
Juntong Deng <juntong.deng@outlook.com> |
net/packet: Add getsockopt support for PACKET_COPY_THRESH
Currently getsockopt does not support PACKET_COPY_THRESH, and we are unable to get the value of PACKET_COPY_THRESH socket option through get
net/packet: Add getsockopt support for PACKET_COPY_THRESH
Currently getsockopt does not support PACKET_COPY_THRESH, and we are unable to get the value of PACKET_COPY_THRESH socket option through getsockopt.
This patch adds getsockopt support for PACKET_COPY_THRESH.
In addition, this patch converts access to copy_thresh to READ_ONCE/WRITE_ONCE.
Signed-off-by: Juntong Deng <juntong.deng@outlook.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/AM6PR03MB58487A9704FD150CF76F542899272@AM6PR03MB5848.eurprd03.prod.outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
885c36e5 |
| 01-Mar-2024 |
Abhishek Chauhan <quic_abchauha@quicinc.com> |
net: Re-use and set mono_delivery_time bit for userspace tstamp packets
Bridge driver today has no support to forward the userspace timestamp packets and ends up resetting the timestamp. ETF qdisc c
net: Re-use and set mono_delivery_time bit for userspace tstamp packets
Bridge driver today has no support to forward the userspace timestamp packets and ends up resetting the timestamp. ETF qdisc checks the packet coming from userspace and encounters to be 0 thereby dropping time sensitive packets. These changes will allow userspace timestamps packets to be forwarded from the bridge to NIC drivers.
Setting the same bit (mono_delivery_time) to avoid dropping of userspace tstamp packets in the forwarding path.
Existing functionality of mono_delivery_time remains unaltered here, instead just extended with userspace tstamp support for bridge forwarding path.
Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240301201348.2815102-1-quic_abchauha@quicinc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
b8549d85 |
| 04-Jan-2024 |
Jakub Kicinski <kuba@kernel.org> |
net: fill in MODULE_DESCRIPTION() for AF_PACKET
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add description to net/packet/af_packet.c
Acked-by: Willem de Bruijn <willemb@
net: fill in MODULE_DESCRIPTION() for AF_PACKET
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add description to net/packet/af_packet.c
Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240104144119.1319055-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
2f57dd94 |
| 04-Dec-2023 |
Yan Zhai <yan@cloudflare.com> |
packet: add a generic drop reason for receive
Commit da37845fdce2 ("packet: uses kfree_skb() for errors.") switches from consume_skb to kfree_skb to improve error handling. However, this could bring
packet: add a generic drop reason for receive
Commit da37845fdce2 ("packet: uses kfree_skb() for errors.") switches from consume_skb to kfree_skb to improve error handling. However, this could bring a lot of noises when we monitor real packet drops in kfree_skb[1], because in tpacket_rcv or packet_rcv only packet clones can be freed, not actual packets.
Adding a generic drop reason to allow distinguish these "clone drops".
[1]: https://lore.kernel.org/netdev/CABWYdi00L+O30Q=Zah28QwZ_5RU-xcxLFUK2Zj08A8MrLk9jzg@mail.gmail.com/ Fixes: da37845fdce2 ("packet: uses kfree_skb() for errors.") Suggested-by: Eric Dumazet <edumazet@google.com> Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Signed-off-by: Yan Zhai <yan@cloudflare.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/ZW4piNbx3IenYnuw@debian.debian Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
db3fadac |
| 01-Dec-2023 |
Daniel Borkmann <daniel@iogearbox.net> |
packet: Move reference count in packet_sock to atomic_long_t
In some potential instances the reference count on struct packet_sock could be saturated and cause overflows which gets the kernel a bit
packet: Move reference count in packet_sock to atomic_long_t
In some potential instances the reference count on struct packet_sock could be saturated and cause overflows which gets the kernel a bit confused. To prevent this, move to a 64-bit atomic reference count on 64-bit architectures to prevent the possibility of this type to overflow.
Because we can not handle saturation, using refcount_t is not possible in this place. Maybe someday in the future if it changes it could be used. Also, instead of using plain atomic64_t, use atomic_long_t instead. 32-bit machines tend to be memory-limited (i.e. anything that increases a reference uses so much memory that you can't actually get to 2**32 references). 32-bit architectures also tend to have serious problems with 64-bit atomics. Hence, atomic_long_t is the more natural solution.
Reported-by: "The UK's National Cyber Security Centre (NCSC)" <security@ncsc.gov.uk> Co-developed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231201131021.19999-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
e2bca487 |
| 09-Oct-2023 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
af_packet: Fix fortified memcpy() without flex array.
Sergei Trofimovich reported a regression [0] caused by commit a0ade8404c3b ("af_packet: Fix warning of fortified memcpy() in packet_getname().")
af_packet: Fix fortified memcpy() without flex array.
Sergei Trofimovich reported a regression [0] caused by commit a0ade8404c3b ("af_packet: Fix warning of fortified memcpy() in packet_getname().").
It introduced a flex array sll_addr_flex in struct sockaddr_ll as a union-ed member with sll_addr to work around the fortified memcpy() check.
However, a userspace program uses a struct that has struct sockaddr_ll in the middle, where a flex array is illegal to exist.
include/linux/if_packet.h:24:17: error: flexible array member 'sockaddr_ll::<unnamed union>::<unnamed struct>::sll_addr_flex' not at end of 'struct packet_info_t' 24 | __DECLARE_FLEX_ARRAY(unsigned char, sll_addr_flex); | ^~~~~~~~~~~~~~~~~~~~
To fix the regression, let's go back to the first attempt [1] telling memcpy() the actual size of the array.
Reported-by: Sergei Trofimovich <slyich@gmail.com> Closes: https://github.com/NixOS/nixpkgs/pull/252587#issuecomment-1741733002 [0] Link: https://lore.kernel.org/netdev/20230720004410.87588-3-kuniyu@amazon.com/ [1] Fixes: a0ade8404c3b ("af_packet: Fix warning of fortified memcpy() in packet_getname().") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20231009153151.75688-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
8a989617 |
| 03-Aug-2023 |
Eric Dumazet <edumazet@google.com> |
net/packet: annotate data-races around tp->status
Another syzbot report [1] is about tp->status lockless reads from __packet_get_status()
[1] BUG: KCSAN: data-race in __packet_rcv_has_room / __pack
net/packet: annotate data-races around tp->status
Another syzbot report [1] is about tp->status lockless reads from __packet_get_status()
[1] BUG: KCSAN: data-race in __packet_rcv_has_room / __packet_set_status
write to 0xffff888117d7c080 of 8 bytes by interrupt on cpu 0: __packet_set_status+0x78/0xa0 net/packet/af_packet.c:407 tpacket_rcv+0x18bb/0x1a60 net/packet/af_packet.c:2483 deliver_skb net/core/dev.c:2173 [inline] __netif_receive_skb_core+0x408/0x1e80 net/core/dev.c:5337 __netif_receive_skb_one_core net/core/dev.c:5491 [inline] __netif_receive_skb+0x57/0x1b0 net/core/dev.c:5607 process_backlog+0x21f/0x380 net/core/dev.c:5935 __napi_poll+0x60/0x3b0 net/core/dev.c:6498 napi_poll net/core/dev.c:6565 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6698 __do_softirq+0xc1/0x265 kernel/softirq.c:571 invoke_softirq kernel/softirq.c:445 [inline] __irq_exit_rcu+0x57/0xa0 kernel/softirq.c:650 sysvec_apic_timer_interrupt+0x6d/0x80 arch/x86/kernel/apic/apic.c:1106 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 smpboot_thread_fn+0x33c/0x4a0 kernel/smpboot.c:112 kthread+0x1d7/0x210 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
read to 0xffff888117d7c080 of 8 bytes by interrupt on cpu 1: __packet_get_status net/packet/af_packet.c:436 [inline] packet_lookup_frame net/packet/af_packet.c:524 [inline] __tpacket_has_room net/packet/af_packet.c:1255 [inline] __packet_rcv_has_room+0x3f9/0x450 net/packet/af_packet.c:1298 tpacket_rcv+0x275/0x1a60 net/packet/af_packet.c:2285 deliver_skb net/core/dev.c:2173 [inline] dev_queue_xmit_nit+0x38a/0x5e0 net/core/dev.c:2243 xmit_one net/core/dev.c:3574 [inline] dev_hard_start_xmit+0xcf/0x3f0 net/core/dev.c:3594 __dev_queue_xmit+0xefb/0x1d10 net/core/dev.c:4244 dev_queue_xmit include/linux/netdevice.h:3088 [inline] can_send+0x4eb/0x5d0 net/can/af_can.c:276 bcm_can_tx+0x314/0x410 net/can/bcm.c:302 bcm_tx_timeout_handler+0xdb/0x260 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0x217/0x700 kernel/time/hrtimer.c:1749 hrtimer_run_softirq+0xd6/0x120 kernel/time/hrtimer.c:1766 __do_softirq+0xc1/0x265 kernel/softirq.c:571 run_ksoftirqd+0x17/0x20 kernel/softirq.c:939 smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164 kthread+0x1d7/0x210 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
value changed: 0x0000000000000000 -> 0x0000000020000081
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 19 Comm: ksoftirqd/1 Not tainted 6.4.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230803145600.2937518-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
ae6db08f |
| 01-Aug-2023 |
Eric Dumazet <edumazet@google.com> |
net/packet: change packet_alloc_skb() to allow bigger paged allocations
packet_alloc_skb() is currently calling sock_alloc_send_pskb() forcing order-0 page allocations.
Switch to PAGE_ALLOC_COSTLY_
net/packet: change packet_alloc_skb() to allow bigger paged allocations
packet_alloc_skb() is currently calling sock_alloc_send_pskb() forcing order-0 page allocations.
Switch to PAGE_ALLOC_COSTLY_ORDER, to increase max size by 8x.
Also add logic to increase the linear part if needed.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tahsin Erdogan <trdgn@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230801205254.400094-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
8bf43be7 |
| 28-Jul-2023 |
Eric Dumazet <edumazet@google.com> |
net: annotate data-races around sk->sk_priority
sk_getsockopt() runs locklessly. This means sk->sk_priority can be read while other threads are changing its value.
Other reads also happen without s
net: annotate data-races around sk->sk_priority
sk_getsockopt() runs locklessly. This means sk->sk_priority can be read while other threads are changing its value.
Other reads also happen without socket lock being held.
Add missing annotations where needed.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
3c5b4d69 |
| 28-Jul-2023 |
Eric Dumazet <edumazet@google.com> |
net: annotate data-races around sk->sk_mark
sk->sk_mark is often read while another thread could change the value.
Fixes: 4a19ec5800fc ("[NET]: Introducing socket mark socket option.") Signed-off-b
net: annotate data-races around sk->sk_mark
sk->sk_mark is often read while another thread could change the value.
Fixes: 4a19ec5800fc ("[NET]: Introducing socket mark socket option.") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
a0ade840 |
| 24-Jul-2023 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
af_packet: Fix warning of fortified memcpy() in packet_getname().
syzkaller found a warning in packet_getname() [0], where we try to copy 16 bytes to sockaddr_ll.sll_addr[8].
Some devices (ip6gre,
af_packet: Fix warning of fortified memcpy() in packet_getname().
syzkaller found a warning in packet_getname() [0], where we try to copy 16 bytes to sockaddr_ll.sll_addr[8].
Some devices (ip6gre, vti6, ip6tnl) have 16 bytes address expressed by struct in6_addr. Also, Infiniband has 32 bytes as MAX_ADDR_LEN.
The write seems to overflow, but actually not since we use struct sockaddr_storage defined in __sys_getsockname() and its size is 128 (_K_SS_MAXSIZE) bytes. Thus, we have sufficient room after sll_addr[] as __data[].
To avoid the warning, let's add a flex array member union-ed with sll_addr.
Another option would be to use strncpy() and limit the copied length to sizeof(sll_addr), but it will return the partial address and break an application that passes sockaddr_storage to getsockname().
[0]: memcpy: detected field-spanning write (size 16) of single field "sll->sll_addr" at net/packet/af_packet.c:3604 (size 8) WARNING: CPU: 0 PID: 255 at net/packet/af_packet.c:3604 packet_getname+0x25c/0x3a0 net/packet/af_packet.c:3604 Modules linked in: CPU: 0 PID: 255 Comm: syz-executor750 Not tainted 6.5.0-rc1-00330-g60cc1f7d0605 #4 Hardware name: linux,dummy-virt (DT) pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : packet_getname+0x25c/0x3a0 net/packet/af_packet.c:3604 lr : packet_getname+0x25c/0x3a0 net/packet/af_packet.c:3604 sp : ffff800089887bc0 x29: ffff800089887bc0 x28: ffff000010f80f80 x27: 0000000000000003 x26: dfff800000000000 x25: ffff700011310f80 x24: ffff800087d55000 x23: dfff800000000000 x22: ffff800089887c2c x21: 0000000000000010 x20: ffff00000de08310 x19: ffff800089887c20 x18: ffff800086ab1630 x17: 20646c6569662065 x16: 6c676e697320666f x15: 0000000000000001 x14: 1fffe0000d56d7ca x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 3e60944c3da92b00 x8 : 3e60944c3da92b00 x7 : 0000000000000001 x6 : 0000000000000001 x5 : ffff8000898874f8 x4 : ffff800086ac99e0 x3 : ffff8000803f8808 x2 : 0000000000000001 x1 : 0000000100000000 x0 : 0000000000000000 Call trace: packet_getname+0x25c/0x3a0 net/packet/af_packet.c:3604 __sys_getsockname+0x168/0x24c net/socket.c:2042 __do_sys_getsockname net/socket.c:2057 [inline] __se_sys_getsockname net/socket.c:2054 [inline] __arm64_sys_getsockname+0x7c/0x94 net/socket.c:2054 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52 el0_svc_common+0x134/0x240 arch/arm64/kernel/syscall.c:139 do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:188 el0_svc+0x2c/0x7c arch/arm64/kernel/entry-common.c:647 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:665 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
Fixes: df8fc4e934c1 ("kbuild: Enable -fstrict-flex-arrays=3") Reported-by: syzkaller <syzkaller@googlegroups.com> Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230724213425.22920-3-kuniyu@amazon.com Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
dc97391e |
| 23-Jun-2023 |
David Howells <dhowells@redhat.com> |
sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)
Remove ->sendpage() and ->sendpage_locked(). sendmsg() with MSG_SPLICE_PAGES should be used instead. This allows multiple pages an
sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)
Remove ->sendpage() and ->sendpage_locked(). sendmsg() with MSG_SPLICE_PAGES should be used instead. This allows multiple pages and multipage folios to be passed through.
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-afs@lists.infradead.org cc: mptcp@lists.linux.dev cc: rds-devel@oss.oracle.com cc: tipc-discussion@lists.sourceforge.net cc: virtualization@lists.linux-foundation.org Link: https://lore.kernel.org/r/20230623225513.2732256-16-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
6ffc57ea |
| 26-May-2023 |
Eric Dumazet <edumazet@google.com> |
af_packet: do not use READ_ONCE() in packet_bind()
A recent patch added READ_ONCE() in packet_bind() and packet_bind_spkt()
This is better handled by reading pkt_sk(sk)->num later in packet_do_bind
af_packet: do not use READ_ONCE() in packet_bind()
A recent patch added READ_ONCE() in packet_bind() and packet_bind_spkt()
This is better handled by reading pkt_sk(sk)->num later in packet_do_bind() while appropriate lock is held.
READ_ONCE() in writers are often an evidence of something being wrong.
Fixes: 822b5a1c17df ("af_packet: Fix data-races of pkt_sk(sk)->num.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230526154342.2533026-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
822b5a1c |
| 24-May-2023 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
af_packet: Fix data-races of pkt_sk(sk)->num.
syzkaller found a data race of pkt_sk(sk)->num.
The value is changed under lock_sock() and po->bind_lock, so we need READ_ONCE() to access pkt_sk(sk)->
af_packet: Fix data-races of pkt_sk(sk)->num.
syzkaller found a data race of pkt_sk(sk)->num.
The value is changed under lock_sock() and po->bind_lock, so we need READ_ONCE() to access pkt_sk(sk)->num without these locks in packet_bind_spkt(), packet_bind(), and sk_diag_fill().
Note that WRITE_ONCE() is already added by commit c7d2ef5dd4b0 ("net/packet: annotate accesses to po->bind").
BUG: KCSAN: data-race in packet_bind / packet_do_bind
write (marked) to 0xffff88802ffd1cee of 2 bytes by task 7322 on cpu 0: packet_do_bind+0x446/0x640 net/packet/af_packet.c:3236 packet_bind+0x99/0xe0 net/packet/af_packet.c:3321 __sys_bind+0x19b/0x1e0 net/socket.c:1803 __do_sys_bind net/socket.c:1814 [inline] __se_sys_bind net/socket.c:1812 [inline] __x64_sys_bind+0x40/0x50 net/socket.c:1812 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc
read to 0xffff88802ffd1cee of 2 bytes by task 7318 on cpu 1: packet_bind+0xbf/0xe0 net/packet/af_packet.c:3322 __sys_bind+0x19b/0x1e0 net/socket.c:1803 __do_sys_bind net/socket.c:1814 [inline] __se_sys_bind net/socket.c:1812 [inline] __x64_sys_bind+0x40/0x50 net/socket.c:1812 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc
value changed: 0x0300 -> 0x0000
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 7318 Comm: syz-executor.4 Not tainted 6.3.0-13380-g7fddb5b5300c #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes: 96ec6327144e ("packet: Diag core and basic socket info dumping") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230524232934.50950-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
4063384e |
| 09-May-2023 |
Eric Dumazet <edumazet@google.com> |
net: add vlan_get_protocol_and_depth() helper
Before blamed commit, pskb_may_pull() was used instead of skb_header_pointer() in __vlan_get_protocol() and friends.
Few callers depended on skb->head
net: add vlan_get_protocol_and_depth() helper
Before blamed commit, pskb_may_pull() was used instead of skb_header_pointer() in __vlan_get_protocol() and friends.
Few callers depended on skb->head being populated with MAC header, syzbot caught one of them (skb_mac_gso_segment())
Add vlan_get_protocol_and_depth() to make the intent clearer and use it where sensible.
This is a more generic fix than commit e9d3f80935b6 ("net/af_packet: make sure to pull mac header") which was dealing with a similar issue.
kernel BUG at include/linux/skbuff.h:2655 ! invalid opcode: 0000 [#1] SMP KASAN CPU: 0 PID: 1441 Comm: syz-executor199 Not tainted 6.1.24-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023 RIP: 0010:__skb_pull include/linux/skbuff.h:2655 [inline] RIP: 0010:skb_mac_gso_segment+0x68f/0x6a0 net/core/gro.c:136 Code: fd 48 8b 5c 24 10 44 89 6b 70 48 c7 c7 c0 ae 0d 86 44 89 e6 e8 a1 91 d0 00 48 c7 c7 00 af 0d 86 48 89 de 31 d2 e8 d1 4a e9 ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 RSP: 0018:ffffc90001bd7520 EFLAGS: 00010286 RAX: ffffffff8469736a RBX: ffff88810f31dac0 RCX: ffff888115a18b00 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffc90001bd75e8 R08: ffffffff84697183 R09: fffff5200037adf9 R10: 0000000000000000 R11: dffffc0000000001 R12: 0000000000000012 R13: 000000000000fee5 R14: 0000000000005865 R15: 000000000000fed7 FS: 000055555633f300(0000) GS:ffff8881f6a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000000 CR3: 0000000116fea000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> [<ffffffff847018dd>] __skb_gso_segment+0x32d/0x4c0 net/core/dev.c:3419 [<ffffffff8470398a>] skb_gso_segment include/linux/netdevice.h:4819 [inline] [<ffffffff8470398a>] validate_xmit_skb+0x3aa/0xee0 net/core/dev.c:3725 [<ffffffff84707042>] __dev_queue_xmit+0x1332/0x3300 net/core/dev.c:4313 [<ffffffff851a9ec7>] dev_queue_xmit+0x17/0x20 include/linux/netdevice.h:3029 [<ffffffff851b4a82>] packet_snd net/packet/af_packet.c:3111 [inline] [<ffffffff851b4a82>] packet_sendmsg+0x49d2/0x6470 net/packet/af_packet.c:3142 [<ffffffff84669a12>] sock_sendmsg_nosec net/socket.c:716 [inline] [<ffffffff84669a12>] sock_sendmsg net/socket.c:736 [inline] [<ffffffff84669a12>] __sys_sendto+0x472/0x5f0 net/socket.c:2139 [<ffffffff84669c75>] __do_sys_sendto net/socket.c:2151 [inline] [<ffffffff84669c75>] __se_sys_sendto net/socket.c:2147 [inline] [<ffffffff84669c75>] __x64_sys_sendto+0xe5/0x100 net/socket.c:2147 [<ffffffff8551d40f>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff8551d40f>] do_syscall_64+0x2f/0x50 arch/x86/entry/common.c:80 [<ffffffff85600087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: 469aceddfa3e ("vlan: consolidate VLAN parsing code and limit max parsing depth") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
6a341729 |
| 01-May-2023 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
af_packet: Don't send zero-byte data in packet_sendmsg_spkt().
syzkaller reported a warning below [0].
We can reproduce it by sending 0-byte data from the (AF_PACKET, SOCK_PACKET) socket via some d
af_packet: Don't send zero-byte data in packet_sendmsg_spkt().
syzkaller reported a warning below [0].
We can reproduce it by sending 0-byte data from the (AF_PACKET, SOCK_PACKET) socket via some devices whose dev->hard_header_len is 0.
struct sockaddr_pkt addr = { .spkt_family = AF_PACKET, .spkt_device = "tun0", }; int fd;
fd = socket(AF_PACKET, SOCK_PACKET, 0); sendto(fd, NULL, 0, 0, (struct sockaddr *)&addr, sizeof(addr));
We have a similar fix for the (AF_PACKET, SOCK_RAW) socket as commit dc633700f00f ("net/af_packet: check len when min_header_len equals to 0").
Let's add the same test for the SOCK_PACKET socket.
[0]: skb_assert_len WARNING: CPU: 1 PID: 19945 at include/linux/skbuff.h:2552 skb_assert_len include/linux/skbuff.h:2552 [inline] WARNING: CPU: 1 PID: 19945 at include/linux/skbuff.h:2552 __dev_queue_xmit+0x1f26/0x31d0 net/core/dev.c:4159 Modules linked in: CPU: 1 PID: 19945 Comm: syz-executor.0 Not tainted 6.3.0-rc7-02330-gca6270c12e20 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:skb_assert_len include/linux/skbuff.h:2552 [inline] RIP: 0010:__dev_queue_xmit+0x1f26/0x31d0 net/core/dev.c:4159 Code: 89 de e8 1d a2 85 fd 84 db 75 21 e8 64 a9 85 fd 48 c7 c6 80 2a 1f 86 48 c7 c7 c0 06 1f 86 c6 05 23 cf 27 04 01 e8 fa ee 56 fd <0f> 0b e8 43 a9 85 fd 0f b6 1d 0f cf 27 04 31 ff 89 de e8 e3 a1 85 RSP: 0018:ffff8880217af6e0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc90001133000 RDX: 0000000000040000 RSI: ffffffff81186922 RDI: 0000000000000001 RBP: ffff8880217af8b0 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff888030045640 R13: ffff8880300456b0 R14: ffff888030045650 R15: ffff888030045718 FS: 00007fc5864da640(0000) GS:ffff88806cd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020005740 CR3: 000000003f856003 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> dev_queue_xmit include/linux/netdevice.h:3085 [inline] packet_sendmsg_spkt+0xc4b/0x1230 net/packet/af_packet.c:2066 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0x1b4/0x200 net/socket.c:747 ____sys_sendmsg+0x331/0x970 net/socket.c:2503 ___sys_sendmsg+0x11d/0x1c0 net/socket.c:2557 __sys_sendmmsg+0x18c/0x430 net/socket.c:2643 __do_sys_sendmmsg net/socket.c:2672 [inline] __se_sys_sendmmsg net/socket.c:2669 [inline] __x64_sys_sendmmsg+0x9c/0x100 net/socket.c:2669 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3c/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7fc58791de5d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007fc5864d9cc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000004bbf80 RCX: 00007fc58791de5d RDX: 0000000000000001 RSI: 0000000020005740 RDI: 0000000000000004 RBP: 00000000004bbf80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007fc58797e530 R15: 0000000000000000 </TASK> ---[ end trace 0000000000000000 ]--- skb len=0 headroom=16 headlen=0 tailroom=304 mac=(16,0) net=(16,-1) trans=-1 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0 dev name=sit0 feat=0x00000006401d7869 sk family=17 type=10 proto=0
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
dfc39d40 |
| 19-Apr-2023 |
Jianfeng Tan <henry.tjf@antgroup.com> |
net/packet: support mergeable feature of virtio
Packet sockets, like tap, can be used as the backend for kernel vhost. In packet sockets, virtio net header size is currently hardcoded to be the size
net/packet: support mergeable feature of virtio
Packet sockets, like tap, can be used as the backend for kernel vhost. In packet sockets, virtio net header size is currently hardcoded to be the size of struct virtio_net_hdr, which is 10 bytes; however, it is not always the case: some virtio features, such as mrg_rxbuf, need virtio net header to be 12-byte long.
Mergeable buffers, as a virtio feature, is worthy of supporting: packets that are larger than one-mbuf size will be dropped in vhost worker's handle_rx if mrg_rxbuf feature is not used, but large packets cannot be avoided and increasing mbuf's size is not economical.
With this virtio feature enabled by virtio-user, packet sockets with hardcoded 10-byte virtio net header will parse mac head incorrectly in packet_snd by taking the last two bytes of virtio net header as part of mac header. This incorrect mac header parsing will cause packet to be dropped due to invalid ether head checking in later under-layer device packet receiving.
By adding extra field vnet_hdr_sz with utilizing holes in struct packet_sock to record currently used virtio net header size and supporting extra sockopt PACKET_VNET_HDR_SZ to set specified vnet_hdr_sz, packet sockets can know the exact length of virtio net header that virtio user gives. In packet_snd, tpacket_snd and packet_recvmsg, instead of using hardcoded virtio net header size, it can get the exact vnet_hdr_sz from corresponding packet_sock, and parse mac header correctly based on this information to avoid the packets being mistakenly dropped.
Signed-off-by: Jianfeng Tan <henry.tjf@antgroup.com> Co-developed-by: Anqi Shen <amy.saq@antgroup.com> Signed-off-by: Anqi Shen <amy.saq@antgroup.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
3948b059 |
| 23-Mar-2023 |
Eric Dumazet <edumazet@google.com> |
net: introduce a config option to tweak MAX_SKB_FRAGS
Currently, MAX_SKB_FRAGS value is 17.
For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg() attempts order-3 allocations, stuff
net: introduce a config option to tweak MAX_SKB_FRAGS
Currently, MAX_SKB_FRAGS value is 17.
For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg() attempts order-3 allocations, stuffing 32768 bytes per frag.
But with zero copy, we use order-0 pages.
For BIG TCP to show its full potential, we add a config option to be able to fit up to 45 segments per skb.
This is also needed for BIG TCP rx zerocopy, as zerocopy currently does not support skbs with frag list.
We have used MAX_SKB_FRAGS=45 value for years at Google before we deployed 4K MTU, with no adverse effect, other than a recent issue in mlx4, fixed in commit 26782aad00cc ("net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS")
Back then, goal was to be able to receive full size (64KB) GRO packets without the frag_list overhead.
Note that /proc/sys/net/core/max_skb_frags can also be used to limit the number of fragments TCP can use in tx packets.
By default we keep the old/legacy value of 17 until we get more coverage for the updated values.
Sizes of struct skb_shared_info on 64bit arches
MAX_SKB_FRAGS | sizeof(struct skb_shared_info): ============================================== 17 320 21 320+64 = 384 25 320+128 = 448 29 320+192 = 512 33 320+256 = 576 37 320+320 = 640 41 320+384 = 704 45 320+448 = 768
This inflation might cause problems for drivers assuming they could pack both the incoming packet (for MTU=1500) and skb_shared_info in half a page, using build_skb().
v3: fix build error when CONFIG_NET=n v2: fix two build errors assuming MAX_SKB_FRAGS was "unsigned long"
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/20230323162842.1935061-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
105a201e |
| 17-Mar-2023 |
Eric Dumazet <edumazet@google.com> |
net/packet: remove po->xmit
Use PACKET_SOCK_QDISC_BYPASS atomic bit instead of a pointer.
This removes one indirect call in fast path, and READ_ONCE()/WRITE_ONCE() annotations as well.
Signed-off-
net/packet: remove po->xmit
Use PACKET_SOCK_QDISC_BYPASS atomic bit instead of a pointer.
This removes one indirect call in fast path, and READ_ONCE()/WRITE_ONCE() annotations as well.
Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Willem de Bruijn <willemb@google.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
791a3e9f |
| 16-Mar-2023 |
Eric Dumazet <edumazet@google.com> |
net/packet: convert po->pressure to an atomic flag
Not only this removes some READ_ONCE()/WRITE_ONCE(), this also removes one integer.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-b
net/packet: convert po->pressure to an atomic flag
Not only this removes some READ_ONCE()/WRITE_ONCE(), this also removes one integer.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
61edf479 |
| 16-Mar-2023 |
Eric Dumazet <edumazet@google.com> |
net/packet: convert po->running to an atomic flag
Instead of consuming 32 bits for po->running, use one available bit in po->flags.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by:
net/packet: convert po->running to an atomic flag
Instead of consuming 32 bits for po->running, use one available bit in po->flags.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|