#
8af4f604 |
| 20-Apr-2024 |
Jakub Kicinski <kuba@kernel.org> |
netlink: support all extack types in dumps
Note that when this commit message refers to netlink dump it only means the actual dumping part, the parsing / dump start is handled by the same code as "d
netlink: support all extack types in dumps
Note that when this commit message refers to netlink dump it only means the actual dumping part, the parsing / dump start is handled by the same code as "doit".
Commit 4a19edb60d02 ("netlink: Pass extack to dump handlers") added support for returning extack messages from dump handlers, but left out other extack info, e.g. bad attribute.
This used to be fine because until YNL we had little practical use for the machine readable attributes, and only messages were used in practice.
YNL flips the preference 180 degrees, it's now much more useful to point to a bad attr with NL_SET_BAD_ATTR() than type an English message saying "attribute XYZ is $reason-why-bad".
Support all of extack. The fact that extack only gets added if it fits remains unaddressed.
Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240420023543.3300306-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
652332e3 |
| 20-Apr-2024 |
Jakub Kicinski <kuba@kernel.org> |
netlink: move extack writing helpers
Next change will need them in netlink_dump_done(), pure move.
Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240420023543.33003
netlink: move extack writing helpers
Next change will need them in netlink_dump_done(), pure move.
Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240420023543.3300306-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
5bc63d3a |
| 29-Mar-2024 |
Jakub Kicinski <kuba@kernel.org> |
netlink: create a new header for internal genetlink symbols
There are things in linux/genetlink.h which are only used under net/netlink/. Move them to a new local header. A new header with just 2 ex
netlink: create a new header for internal genetlink symbols
There are things in linux/genetlink.h which are only used under net/netlink/. Move them to a new local header. A new header with just 2 externs isn't great, but alternative would be to include af_netlink.h in genetlink.c which feels even worse.
Link: https://lore.kernel.org/r/20240329175710.291749-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
8b6d307f |
| 08-Mar-2024 |
Juntong Deng <juntong.deng@outlook.com> |
net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID
Currently getsockopt does not support NETLINK_LISTEN_ALL_NSID, and we are unable to get the value of NETLINK_LISTEN_ALL_NSID socket op
net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID
Currently getsockopt does not support NETLINK_LISTEN_ALL_NSID, and we are unable to get the value of NETLINK_LISTEN_ALL_NSID socket option through getsockopt.
This patch adds getsockopt support for NETLINK_LISTEN_ALL_NSID.
Signed-off-by: Juntong Deng <juntong.deng@outlook.com> Link: https://lore.kernel.org/r/AM6PR03MB58482322B7B335308DA56FE599272@AM6PR03MB5848.eurprd03.prod.outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
b5a89915 |
| 03-Mar-2024 |
Jakub Kicinski <kuba@kernel.org> |
netlink: handle EMSGSIZE errors in the core
Eric points out that our current suggested way of handling EMSGSIZE errors ((err == -EMSGSIZE) ? skb->len : err) will break if we didn't fit even a single
netlink: handle EMSGSIZE errors in the core
Eric points out that our current suggested way of handling EMSGSIZE errors ((err == -EMSGSIZE) ? skb->len : err) will break if we didn't fit even a single object into the buffer provided by the user. This should not happen for well behaved applications, but we can fix that, and free netlink families from dealing with that completely by moving error handling into the core.
Let's assume from now on that all EMSGSIZE errors in dumps are because we run out of skb space. Families can now propagate the error nla_put_*() etc generated and not worry about any return value magic. If some family really wants to send EMSGSIZE to user space, assuming it generates the same error on the next dump iteration the skb->len should be 0, and user space should still see the EMSGSIZE.
This should simplify families and prevent mistakes in return values which lead to DONE being forced into a separate recv() call as discovered by Ido some time ago.
Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
f8cbf6bd |
| 24-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
netlink: use kvmalloc() in netlink_alloc_large_skb()
This is a followup of commit 234ec0b6034b ("netlink: fix potential sleeping issue in mqueue_flush_file"), because vfree_atomic() overhead is unfo
netlink: use kvmalloc() in netlink_alloc_large_skb()
This is a followup of commit 234ec0b6034b ("netlink: fix potential sleeping issue in mqueue_flush_file"), because vfree_atomic() overhead is unfortunate for medium sized allocations.
1) If the allocation is smaller than PAGE_SIZE, do not bother with vmalloc() at all. Some arches have 64KB PAGE_SIZE, while NLMSG_GOODSIZE is smaller than 8KB.
2) Use kvmalloc(), which might allocate one high order page instead of vmalloc if memory is not too fragmented.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20240224090630.605917-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
386520e0 |
| 22-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag
Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag allows dump operations registered via rtnl_register() or rtnl_register_module() to opt-out from RTNL p
rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag
Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag allows dump operations registered via rtnl_register() or rtnl_register_module() to opt-out from RTNL protection.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
e39951d9 |
| 22-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
rtnetlink: change nlk->cb_mutex role
In commit af65bdfce98d ("[NETLINK]: Switch cb_lock spinlock to mutex and allow to override it"), Patrick McHardy used a common mutex to protect both nlk->cb and
rtnetlink: change nlk->cb_mutex role
In commit af65bdfce98d ("[NETLINK]: Switch cb_lock spinlock to mutex and allow to override it"), Patrick McHardy used a common mutex to protect both nlk->cb and the dump() operations.
The override is used for rtnl dumps, registered with rntl_register() and rntl_register_module().
We want to be able to opt-out some dump() operations to not acquire RTNL, so we need to protect nlk->cb with a per socket mutex.
This patch renames nlk->cb_def_mutex to nlk->nl_cb_mutex
The optional pointer to the mutex used to protect dump() call is stored in nlk->dump_cb_mutex
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
b5590270 |
| 22-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
netlink: hold nlk->cb_mutex longer in __netlink_dump_start()
__netlink_dump_start() releases nlk->cb_mutex right before calling netlink_dump() which grabs it again.
This seems dangerous, even if KA
netlink: hold nlk->cb_mutex longer in __netlink_dump_start()
__netlink_dump_start() releases nlk->cb_mutex right before calling netlink_dump() which grabs it again.
This seems dangerous, even if KASAN did not bother yet.
Add a @lock_taken parameter to netlink_dump() to let it grab the mutex if called from netlink_recvmsg() only.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
661779e1 |
| 21-Feb-2024 |
Ryosuke Yasuoka <ryasuoka@redhat.com> |
netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter
syzbot reported the following uninit-value access issue [1]:
netlink_to_full_skb() creates a new `skb` and puts the `skb->data` passed
netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter
syzbot reported the following uninit-value access issue [1]:
netlink_to_full_skb() creates a new `skb` and puts the `skb->data` passed as a 1st arg of netlink_to_full_skb() onto new `skb`. The data size is specified as `len` and passed to skb_put_data(). This `len` is based on `skb->end` that is not data offset but buffer offset. The `skb->end` contains data and tailroom. Since the tailroom is not initialized when the new `skb` created, KMSAN detects uninitialized memory area when copying the data.
This patch resolved this issue by correct the len from `skb->end` to `skb->len`, which is the actual data offset.
BUG: KMSAN: kernel-infoleak-after-free in instrument_copy_to_user include/linux/instrumented.h:114 [inline] BUG: KMSAN: kernel-infoleak-after-free in copy_to_user_iter lib/iov_iter.c:24 [inline] BUG: KMSAN: kernel-infoleak-after-free in iterate_ubuf include/linux/iov_iter.h:29 [inline] BUG: KMSAN: kernel-infoleak-after-free in iterate_and_advance2 include/linux/iov_iter.h:245 [inline] BUG: KMSAN: kernel-infoleak-after-free in iterate_and_advance include/linux/iov_iter.h:271 [inline] BUG: KMSAN: kernel-infoleak-after-free in _copy_to_iter+0x364/0x2520 lib/iov_iter.c:186 instrument_copy_to_user include/linux/instrumented.h:114 [inline] copy_to_user_iter lib/iov_iter.c:24 [inline] iterate_ubuf include/linux/iov_iter.h:29 [inline] iterate_and_advance2 include/linux/iov_iter.h:245 [inline] iterate_and_advance include/linux/iov_iter.h:271 [inline] _copy_to_iter+0x364/0x2520 lib/iov_iter.c:186 copy_to_iter include/linux/uio.h:197 [inline] simple_copy_to_iter+0x68/0xa0 net/core/datagram.c:532 __skb_datagram_iter+0x123/0xdc0 net/core/datagram.c:420 skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:546 skb_copy_datagram_msg include/linux/skbuff.h:3960 [inline] packet_recvmsg+0xd9c/0x2000 net/packet/af_packet.c:3482 sock_recvmsg_nosec net/socket.c:1044 [inline] sock_recvmsg net/socket.c:1066 [inline] sock_read_iter+0x467/0x580 net/socket.c:1136 call_read_iter include/linux/fs.h:2014 [inline] new_sync_read fs/read_write.c:389 [inline] vfs_read+0x8f6/0xe00 fs/read_write.c:470 ksys_read+0x20f/0x4c0 fs/read_write.c:613 __do_sys_read fs/read_write.c:623 [inline] __se_sys_read fs/read_write.c:621 [inline] __x64_sys_read+0x93/0xd0 fs/read_write.c:621 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b
Uninit was stored to memory at: skb_put_data include/linux/skbuff.h:2622 [inline] netlink_to_full_skb net/netlink/af_netlink.c:181 [inline] __netlink_deliver_tap_skb net/netlink/af_netlink.c:298 [inline] __netlink_deliver_tap+0x5be/0xc90 net/netlink/af_netlink.c:325 netlink_deliver_tap net/netlink/af_netlink.c:338 [inline] netlink_deliver_tap_kernel net/netlink/af_netlink.c:347 [inline] netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x10f1/0x1250 net/netlink/af_netlink.c:1368 netlink_sendmsg+0x1238/0x13d0 net/netlink/af_netlink.c:1910 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2584 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638 __sys_sendmsg net/socket.c:2667 [inline] __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __x64_sys_sendmsg+0x307/0x490 net/socket.c:2674 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b
Uninit was created at: free_pages_prepare mm/page_alloc.c:1087 [inline] free_unref_page_prepare+0xb0/0xa40 mm/page_alloc.c:2347 free_unref_page_list+0xeb/0x1100 mm/page_alloc.c:2533 release_pages+0x23d3/0x2410 mm/swap.c:1042 free_pages_and_swap_cache+0xd9/0xf0 mm/swap_state.c:316 tlb_batch_pages_flush mm/mmu_gather.c:98 [inline] tlb_flush_mmu_free mm/mmu_gather.c:293 [inline] tlb_flush_mmu+0x6f5/0x980 mm/mmu_gather.c:300 tlb_finish_mmu+0x101/0x260 mm/mmu_gather.c:392 exit_mmap+0x49e/0xd30 mm/mmap.c:3321 __mmput+0x13f/0x530 kernel/fork.c:1349 mmput+0x8a/0xa0 kernel/fork.c:1371 exit_mm+0x1b8/0x360 kernel/exit.c:567 do_exit+0xd57/0x4080 kernel/exit.c:858 do_group_exit+0x2fd/0x390 kernel/exit.c:1021 __do_sys_exit_group kernel/exit.c:1032 [inline] __se_sys_exit_group kernel/exit.c:1030 [inline] __x64_sys_exit_group+0x3c/0x50 kernel/exit.c:1030 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b
Bytes 3852-3903 of 3904 are uninitialized Memory access of size 3904 starts at ffff88812ea1e000 Data copied to user address 0000000020003280
CPU: 1 PID: 5043 Comm: syz-executor297 Not tainted 6.7.0-rc5-syzkaller-00047-g5bd7ef53ffe5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
Fixes: 1853c9496460 ("netlink, mmap: transform mmap skb into full skb on taps") Reported-and-tested-by: syzbot+34ad5fab48f7bf510349@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=34ad5fab48f7bf510349 [1] Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240221074053.1794118-1-ryasuoka@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
234ec0b6 |
| 22-Jan-2024 |
Zhengchao Shao <shaozhengchao@huawei.com> |
netlink: fix potential sleeping issue in mqueue_flush_file
I analyze the potential sleeping issue of the following processes: Thread A Thread B ...
netlink: fix potential sleeping issue in mqueue_flush_file
I analyze the potential sleeping issue of the following processes: Thread A Thread B ... netlink_create //ref = 1 do_mq_notify ... sock = netlink_getsockbyfilp ... //ref = 2 info->notify_sock = sock; ... ... netlink_sendmsg ... skb = netlink_alloc_large_skb //skb->head is vmalloced ... netlink_unicast ... sk = netlink_getsockbyportid //ref = 3 ... netlink_sendskb ... __netlink_sendskb ... skb_queue_tail //put skb to sk_receive_queue ... sock_put //ref = 2 ... ... ... netlink_release ... deferred_put_nlk_sk //ref = 1 mqueue_flush_file spin_lock remove_notification netlink_sendskb sock_put //ref = 0 sk_free ... __sk_destruct netlink_sock_destruct skb_queue_purge //get skb from sk_receive_queue ... __skb_queue_purge_reason kfree_skb_reason __kfree_skb ... skb_release_all skb_release_head_state netlink_skb_destructor vfree(skb->head) //sleeping while holding spinlock
In netlink_sendmsg, if the memory pointed to by skb->head is allocated by vmalloc, and is put to sk_receive_queue queue, also the skb is not freed. When the mqueue executes flush, the sleeping bug will occur. Use vfree_atomic instead of vfree in netlink_skb_destructor to solve the issue.
Fixes: c05cdb1b864f ("netlink: allow large data transfers from user-space") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20240122011807.2110357-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
403863e9 |
| 16-Dec-2023 |
Jiri Pirko <jiri@nvidia.com> |
netlink: introduce typedef for filter function
Make the code using filter function a bit nicer by consolidating the filter function arguments using typedef.
Suggested-by: Andy Shevchenko <andriy.sh
netlink: introduce typedef for filter function
Make the code using filter function a bit nicer by consolidating the filter function arguments using typedef.
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
ac40916a |
| 15-Nov-2023 |
Li RongQing <lirongqing@baidu.com> |
rtnetlink: introduce nlmsg_new_large and use it in rtnl_getlink
if a PF has 256 or more VFs, ip link command will allocate an order 3 memory or more, and maybe trigger OOM due to memory fragment, th
rtnetlink: introduce nlmsg_new_large and use it in rtnl_getlink
if a PF has 256 or more VFs, ip link command will allocate an order 3 memory or more, and maybe trigger OOM due to memory fragment, the VFs needed memory size is computed in rtnl_vfinfo_size.
so introduce nlmsg_new_large which calls netlink_alloc_large_skb in which vmalloc is used for large memory, to avoid the failure of allocating memory
ip invoked oom-killer: gfp_mask=0xc2cc0(GFP_KERNEL|__GFP_NOWARN|\ __GFP_COMP|__GFP_NOMEMALLOC), order=3, oom_score_adj=0 CPU: 74 PID: 204414 Comm: ip Kdump: loaded Tainted: P OE Call Trace: dump_stack+0x57/0x6a dump_header+0x4a/0x210 oom_kill_process+0xe4/0x140 out_of_memory+0x3e8/0x790 __alloc_pages_slowpath.constprop.116+0x953/0xc50 __alloc_pages_nodemask+0x2af/0x310 kmalloc_large_node+0x38/0xf0 __kmalloc_node_track_caller+0x417/0x4d0 __kmalloc_reserve.isra.61+0x2e/0x80 __alloc_skb+0x82/0x1c0 rtnl_getlink+0x24f/0x370 rtnetlink_rcv_msg+0x12c/0x350 netlink_rcv_skb+0x50/0x100 netlink_unicast+0x1b2/0x280 netlink_sendmsg+0x355/0x4a0 sock_sendmsg+0x5b/0x60 ____sys_sendmsg+0x1ea/0x250 ___sys_sendmsg+0x88/0xd0 __sys_sendmsg+0x5e/0xa0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f95a65a5b70
Cc: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Link: https://lore.kernel.org/r/20231115120108.3711-1-lirongqing@baidu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
d0f95894 |
| 03-Oct-2023 |
Eric Dumazet <edumazet@google.com> |
netlink: annotate data-races around sk->sk_err
syzbot caught another data-race in netlink when setting sk->sk_err.
Annotate all of them for good measure.
BUG: KCSAN: data-race in netlink_recvmsg /
netlink: annotate data-races around sk->sk_err
syzbot caught another data-race in netlink when setting sk->sk_err.
Annotate all of them for good measure.
BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg
write to 0xffff8881613bb220 of 4 bytes by task 28147 on cpu 0: netlink_recvmsg+0x448/0x780 net/netlink/af_netlink.c:1994 sock_recvmsg_nosec net/socket.c:1027 [inline] sock_recvmsg net/socket.c:1049 [inline] __sys_recvfrom+0x1f4/0x2e0 net/socket.c:2229 __do_sys_recvfrom net/socket.c:2247 [inline] __se_sys_recvfrom net/socket.c:2243 [inline] __x64_sys_recvfrom+0x78/0x90 net/socket.c:2243 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
write to 0xffff8881613bb220 of 4 bytes by task 28146 on cpu 1: netlink_recvmsg+0x448/0x780 net/netlink/af_netlink.c:1994 sock_recvmsg_nosec net/socket.c:1027 [inline] sock_recvmsg net/socket.c:1049 [inline] __sys_recvfrom+0x1f4/0x2e0 net/socket.c:2229 __do_sys_recvfrom net/socket.c:2247 [inline] __se_sys_recvfrom net/socket.c:2243 [inline] __x64_sys_recvfrom+0x78/0x90 net/socket.c:2243 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x00000000 -> 0x00000016
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 28146 Comm: syz-executor.0 Not tainted 6.6.0-rc3-syzkaller-00055-g9ed22ae6be81 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231003183455.3410550-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
8fe08d70 |
| 11-Aug-2023 |
Eric Dumazet <edumazet@google.com> |
netlink: convert nlk->flags to atomic flags
sk_diag_put_flags(), netlink_setsockopt(), netlink_getsockopt() and others use nlk->flags without correct locking.
Use set_bit(), clear_bit(), test_bit()
netlink: convert nlk->flags to atomic flags
sk_diag_put_flags(), netlink_setsockopt(), netlink_getsockopt() and others use nlk->flags without correct locking.
Use set_bit(), clear_bit(), test_bit(), assign_bit() to remove data-races.
Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
a4c9a56e |
| 19-Jul-2023 |
Anjali Kulkarni <anjali.k.kulkarni@oracle.com> |
netlink: Add new netlink_release function
A new function netlink_release is added in netlink_sock to store the protocol's release function. This is called when the socket is deleted. This can be sup
netlink: Add new netlink_release function
A new function netlink_release is added in netlink_sock to store the protocol's release function. This is called when the socket is deleted. This can be supplied by the protocol via the release function in netlink_kernel_cfg. This is being added for the NETLINK_CONNECTOR protocol, so it can free it's data when socket is deleted.
Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
a3377386 |
| 19-Jul-2023 |
Anjali Kulkarni <anjali.k.kulkarni@oracle.com> |
netlink: Reverse the patch which removed filtering
To use filtering at the connector & cn_proc layers, we need to enable filtering in the netlink layer. This reverses the patch which removed netlink
netlink: Reverse the patch which removed filtering
To use filtering at the connector & cn_proc layers, we need to enable filtering in the netlink layer. This reverses the patch which removed netlink filtering - commit ID for that patch: 549017aa1bb7 (netlink: remove netlink_broadcast_filtered).
Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
b8e39b38 |
| 10-Jul-2023 |
Andy Shevchenko <andriy.shevchenko@linux.intel.com> |
netlink: Make use of __assign_bit() API
We have for some time the __assign_bit() API to replace open coded
if (foo) __set_bit(n, bar); else __clear_bit(n, bar);
Use this API in the code. No
netlink: Make use of __assign_bit() API
We have for some time the __assign_bit() API to replace open coded
if (foo) __set_bit(n, bar); else __clear_bit(n, bar);
Use this API in the code. No functional change intended.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Message-ID: <20230710100830.89936-2-andriy.shevchenko@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
dc97391e |
| 23-Jun-2023 |
David Howells <dhowells@redhat.com> |
sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)
Remove ->sendpage() and ->sendpage_locked(). sendmsg() with MSG_SPLICE_PAGES should be used instead. This allows multiple pages an
sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)
Remove ->sendpage() and ->sendpage_locked(). sendmsg() with MSG_SPLICE_PAGES should be used instead. This allows multiple pages and multipage folios to be passed through.
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-afs@lists.infradead.org cc: mptcp@lists.linux.dev cc: rds-devel@oss.oracle.com cc: tipc-discussion@lists.sourceforge.net cc: virtualization@lists.linux-foundation.org Link: https://lore.kernel.org/r/20230623225513.2732256-16-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
8d61f926 |
| 21-Jun-2023 |
Eric Dumazet <edumazet@google.com> |
netlink: fix potential deadlock in netlink_set_err()
syzbot reported a possible deadlock in netlink_set_err() [1]
A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs for netlin
netlink: fix potential deadlock in netlink_set_err()
syzbot reported a possible deadlock in netlink_set_err() [1]
A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs for netlink_lock_table()") in netlink_lock_table()
This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump() which were not covered by cited commit.
[1]
WARNING: possible irq lock inversion dependency detected 6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted
syz-executor.2/23011 just changed the state of lock: ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612 but this lock was taken by another, SOFTIRQ-safe lock in the past: (&local->queue_stop_reason_lock){..-.}-{2:2}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this: Possible interrupt unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(nl_table_lock); local_irq_disable(); lock(&local->queue_stop_reason_lock); lock(nl_table_lock); <Interrupt> lock(&local->queue_stop_reason_lock);
*** DEADLOCK ***
Fixes: 1d482e666b8e ("netlink: disable IRQs for netlink_lock_table()") Reported-by: syzbot+a7d200a347f912723e5c@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=a7d200a347f912723e5c Link: https://lore.kernel.org/netdev/000000000000e38d1605fea5747e@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20230621154337.1668594-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
5ab8c41c |
| 09-Jun-2023 |
Jakub Kicinski <kuba@kernel.org> |
netlink: support extack in dump ->start()
Commit 4a19edb60d02 ("netlink: Pass extack to dump handlers") added extack support to netlink dumps. It was focused on rtnl and since rtnl does not use ->st
netlink: support extack in dump ->start()
Commit 4a19edb60d02 ("netlink: Pass extack to dump handlers") added extack support to netlink dumps. It was focused on rtnl and since rtnl does not use ->start(), ->done() callbacks it ignored those. Genetlink on the other hand uses ->start() extensively, for parsing and input validation.
Pass the extact in via struct netlink_dump_control and link it to cb for the time of ->start(). Both struct netlink_dump_control and extack itself live on the stack so we can't keep the same extack for the duration of the dump. This means that the extack visible in ->start() and each ->dump() callbacks will be different. Corner cases like reporting a warning message in DONE across dump calls are still not supported.
We could put the extack (for dumps) in the socket struct, but layering makes it slightly awkward (extack pointer is decided before the DO / DUMP split).
The genetlink dump error extacks are now surfaced:
$ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get lib.ynl.NlError: Netlink error: Invalid argument nl_len = 64 (48) nl_flags = 0x300 nl_type = 2 error: -22 extack: {'msg': 'request header missing'}
Previously extack was missing:
$ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get lib.ynl.NlError: Netlink error: Invalid argument nl_len = 36 (20) nl_flags = 0x100 nl_type = 2 error: -22
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
f4e45348 |
| 29-May-2023 |
Pedro Tammela <pctammela@mojatatu.com> |
net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report
The current code for the length calculation wrongly truncates the reported length of the groups array, causing an under report of the subscrib
net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report
The current code for the length calculation wrongly truncates the reported length of the groups array, causing an under report of the subscribed groups. To fix this, use 'BITS_TO_BYTES()' which rounds up the division by 8.
Fixes: b42be38b2778 ("netlink: add API to retrieve all group memberships") Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230529153335.389815-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
a939d149 |
| 09-May-2023 |
Eric Dumazet <edumazet@google.com> |
netlink: annotate accesses to nlk->cb_running
Both netlink_recvmsg() and netlink_native_seq_show() read nlk->cb_running locklessly. Use READ_ONCE() there.
Add corresponding WRITE_ONCE() to netlink_
netlink: annotate accesses to nlk->cb_running
Both netlink_recvmsg() and netlink_native_seq_show() read nlk->cb_running locklessly. Use READ_ONCE() there.
Add corresponding WRITE_ONCE() to netlink_dump() and __netlink_dump_start()
syzbot reported: BUG: KCSAN: data-race in __netlink_dump_start / netlink_recvmsg
write to 0xffff88813ea4db59 of 1 bytes by task 28219 on cpu 0: __netlink_dump_start+0x3af/0x4d0 net/netlink/af_netlink.c:2399 netlink_dump_start include/linux/netlink.h:308 [inline] rtnetlink_rcv_msg+0x70f/0x8c0 net/core/rtnetlink.c:6130 netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2577 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6192 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1942 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] sock_write_iter+0x1aa/0x230 net/socket.c:1138 call_write_iter include/linux/fs.h:1851 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x463/0x760 fs/read_write.c:584 ksys_write+0xeb/0x1a0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x42/0x50 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff88813ea4db59 of 1 bytes by task 28222 on cpu 1: netlink_recvmsg+0x3b4/0x730 net/netlink/af_netlink.c:2022 sock_recvmsg_nosec+0x4c/0x80 net/socket.c:1017 ____sys_recvmsg+0x2db/0x310 net/socket.c:2718 ___sys_recvmsg net/socket.c:2762 [inline] do_recvmmsg+0x2e5/0x710 net/socket.c:2856 __sys_recvmmsg net/socket.c:2935 [inline] __do_sys_recvmmsg net/socket.c:2958 [inline] __se_sys_recvmmsg net/socket.c:2951 [inline] __x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x00 -> 0x01
Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
d913d32c |
| 21-Apr-2023 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
netlink: Use copy_to_user() for optval in netlink_getsockopt().
Brad Spencer provided a detailed report [0] that when calling getsockopt() for AF_NETLINK, some SOL_NETLINK options set only 1 byte ev
netlink: Use copy_to_user() for optval in netlink_getsockopt().
Brad Spencer provided a detailed report [0] that when calling getsockopt() for AF_NETLINK, some SOL_NETLINK options set only 1 byte even though such options require at least sizeof(int) as length.
The options return a flag value that fits into 1 byte, but such behaviour confuses users who do not initialise the variable before calling getsockopt() and do not strictly check the returned value as char.
Currently, netlink_getsockopt() uses put_user() to copy data to optlen and optval, but put_user() casts the data based on the pointer, char *optval. As a result, only 1 byte is set to optval.
To avoid this behaviour, we need to use copy_to_user() or cast optval for put_user().
Note that this changes the behaviour on big-endian systems, but we document that the size of optval is int in the man page.
$ man 7 netlink ... Socket options To set or get a netlink socket option, call getsockopt(2) to read or setsockopt(2) to write the option with the option level argument set to SOL_NETLINK. Unless otherwise noted, optval is a pointer to an int.
Fixes: 9a4595bc7e67 ("[NETLINK]: Add set/getsockopt options to support more than 32 groups") Fixes: be0c22a46cfb ("netlink: add NETLINK_BROADCAST_ERROR socket option") Fixes: 38938bfe3489 ("netlink: add NETLINK_NO_ENOBUFS socket flag") Fixes: 0a6a3a23ea6e ("netlink: add NETLINK_CAP_ACK socket option") Fixes: 2d4bc93368f5 ("netlink: extended ACK reporting") Fixes: 89d35528d17d ("netlink: Add new socket option to enable strict checking on dumps") Reported-by: Brad Spencer <bspencer@blackberry.com> Link: https://lore.kernel.org/netdev/ZD7VkNWFfp22kTDt@datsun.rim.net/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Link: https://lore.kernel.org/r/20230421185255.94606-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
a1865f2e |
| 03-Apr-2023 |
Eric Dumazet <edumazet@google.com> |
netlink: annotate lockless accesses to nlk->max_recvmsg_len
syzbot reported a data-race in data-race in netlink_recvmsg() [1]
Indeed, netlink_recvmsg() can be run concurrently, and netlink_dump() a
netlink: annotate lockless accesses to nlk->max_recvmsg_len
syzbot reported a data-race in data-race in netlink_recvmsg() [1]
Indeed, netlink_recvmsg() can be run concurrently, and netlink_dump() also needs protection.
[1] BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg
read to 0xffff888141840b38 of 8 bytes by task 23057 on cpu 0: netlink_recvmsg+0xea/0x730 net/netlink/af_netlink.c:1988 sock_recvmsg_nosec net/socket.c:1017 [inline] sock_recvmsg net/socket.c:1038 [inline] __sys_recvfrom+0x1ee/0x2e0 net/socket.c:2194 __do_sys_recvfrom net/socket.c:2212 [inline] __se_sys_recvfrom net/socket.c:2208 [inline] __x64_sys_recvfrom+0x78/0x90 net/socket.c:2208 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
write to 0xffff888141840b38 of 8 bytes by task 23037 on cpu 1: netlink_recvmsg+0x114/0x730 net/netlink/af_netlink.c:1989 sock_recvmsg_nosec net/socket.c:1017 [inline] sock_recvmsg net/socket.c:1038 [inline] ____sys_recvmsg+0x156/0x310 net/socket.c:2720 ___sys_recvmsg net/socket.c:2762 [inline] do_recvmmsg+0x2e5/0x710 net/socket.c:2856 __sys_recvmmsg net/socket.c:2935 [inline] __do_sys_recvmmsg net/socket.c:2958 [inline] __se_sys_recvmmsg net/socket.c:2951 [inline] __x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x0000000000000000 -> 0x0000000000001000
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 23037 Comm: syz-executor.2 Not tainted 6.3.0-rc4-syzkaller-00195-g5a57b48fdfcb #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
Fixes: 9063e21fb026 ("netlink: autosize skb lengthes") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230403214643.768555-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|