History log of /linux/net/netlink/af_netlink.c (Results 1 – 25 of 444)
Revision Date Author Comments
# 8af4f604 20-Apr-2024 Jakub Kicinski <kuba@kernel.org>

netlink: support all extack types in dumps

Note that when this commit message refers to netlink dump
it only means the actual dumping part, the parsing / dump
start is handled by the same code as "d

netlink: support all extack types in dumps

Note that when this commit message refers to netlink dump
it only means the actual dumping part, the parsing / dump
start is handled by the same code as "doit".

Commit 4a19edb60d02 ("netlink: Pass extack to dump handlers")
added support for returning extack messages from dump handlers,
but left out other extack info, e.g. bad attribute.

This used to be fine because until YNL we had little practical
use for the machine readable attributes, and only messages were
used in practice.

YNL flips the preference 180 degrees, it's now much more useful
to point to a bad attr with NL_SET_BAD_ATTR() than type
an English message saying "attribute XYZ is $reason-why-bad".

Support all of extack. The fact that extack only gets added if
it fits remains unaddressed.

Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240420023543.3300306-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 652332e3 20-Apr-2024 Jakub Kicinski <kuba@kernel.org>

netlink: move extack writing helpers

Next change will need them in netlink_dump_done(), pure move.

Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240420023543.33003

netlink: move extack writing helpers

Next change will need them in netlink_dump_done(), pure move.

Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240420023543.3300306-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 5bc63d3a 29-Mar-2024 Jakub Kicinski <kuba@kernel.org>

netlink: create a new header for internal genetlink symbols

There are things in linux/genetlink.h which are only used
under net/netlink/. Move them to a new local header.
A new header with just 2 ex

netlink: create a new header for internal genetlink symbols

There are things in linux/genetlink.h which are only used
under net/netlink/. Move them to a new local header.
A new header with just 2 externs isn't great, but alternative
would be to include af_netlink.h in genetlink.c which feels
even worse.

Link: https://lore.kernel.org/r/20240329175710.291749-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 8b6d307f 08-Mar-2024 Juntong Deng <juntong.deng@outlook.com>

net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID

Currently getsockopt does not support NETLINK_LISTEN_ALL_NSID,
and we are unable to get the value of NETLINK_LISTEN_ALL_NSID
socket op

net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID

Currently getsockopt does not support NETLINK_LISTEN_ALL_NSID,
and we are unable to get the value of NETLINK_LISTEN_ALL_NSID
socket option through getsockopt.

This patch adds getsockopt support for NETLINK_LISTEN_ALL_NSID.

Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Link: https://lore.kernel.org/r/AM6PR03MB58482322B7B335308DA56FE599272@AM6PR03MB5848.eurprd03.prod.outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# b5a89915 03-Mar-2024 Jakub Kicinski <kuba@kernel.org>

netlink: handle EMSGSIZE errors in the core

Eric points out that our current suggested way of handling
EMSGSIZE errors ((err == -EMSGSIZE) ? skb->len : err) will
break if we didn't fit even a single

netlink: handle EMSGSIZE errors in the core

Eric points out that our current suggested way of handling
EMSGSIZE errors ((err == -EMSGSIZE) ? skb->len : err) will
break if we didn't fit even a single object into the buffer
provided by the user. This should not happen for well behaved
applications, but we can fix that, and free netlink families
from dealing with that completely by moving error handling
into the core.

Let's assume from now on that all EMSGSIZE errors in dumps are
because we run out of skb space. Families can now propagate
the error nla_put_*() etc generated and not worry about any
return value magic. If some family really wants to send EMSGSIZE
to user space, assuming it generates the same error on the next
dump iteration the skb->len should be 0, and user space should
still see the EMSGSIZE.

This should simplify families and prevent mistakes in return
values which lead to DONE being forced into a separate recv()
call as discovered by Ido some time ago.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# f8cbf6bd 24-Feb-2024 Eric Dumazet <edumazet@google.com>

netlink: use kvmalloc() in netlink_alloc_large_skb()

This is a followup of commit 234ec0b6034b ("netlink: fix potential
sleeping issue in mqueue_flush_file"), because vfree_atomic()
overhead is unfo

netlink: use kvmalloc() in netlink_alloc_large_skb()

This is a followup of commit 234ec0b6034b ("netlink: fix potential
sleeping issue in mqueue_flush_file"), because vfree_atomic()
overhead is unfortunate for medium sized allocations.

1) If the allocation is smaller than PAGE_SIZE, do not bother
with vmalloc() at all. Some arches have 64KB PAGE_SIZE,
while NLMSG_GOODSIZE is smaller than 8KB.

2) Use kvmalloc(), which might allocate one high order page
instead of vmalloc if memory is not too fragmented.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20240224090630.605917-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 386520e0 22-Feb-2024 Eric Dumazet <edumazet@google.com>

rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag

Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag
allows dump operations registered via rtnl_register()
or rtnl_register_module() to opt-out from RTNL p

rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flag

Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag
allows dump operations registered via rtnl_register()
or rtnl_register_module() to opt-out from RTNL protection.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# e39951d9 22-Feb-2024 Eric Dumazet <edumazet@google.com>

rtnetlink: change nlk->cb_mutex role

In commit af65bdfce98d ("[NETLINK]: Switch cb_lock spinlock
to mutex and allow to override it"), Patrick McHardy used
a common mutex to protect both nlk->cb and

rtnetlink: change nlk->cb_mutex role

In commit af65bdfce98d ("[NETLINK]: Switch cb_lock spinlock
to mutex and allow to override it"), Patrick McHardy used
a common mutex to protect both nlk->cb and the dump() operations.

The override is used for rtnl dumps, registered with
rntl_register() and rntl_register_module().

We want to be able to opt-out some dump() operations
to not acquire RTNL, so we need to protect nlk->cb
with a per socket mutex.

This patch renames nlk->cb_def_mutex to nlk->nl_cb_mutex

The optional pointer to the mutex used to protect dump()
call is stored in nlk->dump_cb_mutex

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# b5590270 22-Feb-2024 Eric Dumazet <edumazet@google.com>

netlink: hold nlk->cb_mutex longer in __netlink_dump_start()

__netlink_dump_start() releases nlk->cb_mutex right before
calling netlink_dump() which grabs it again.

This seems dangerous, even if KA

netlink: hold nlk->cb_mutex longer in __netlink_dump_start()

__netlink_dump_start() releases nlk->cb_mutex right before
calling netlink_dump() which grabs it again.

This seems dangerous, even if KASAN did not bother yet.

Add a @lock_taken parameter to netlink_dump() to let it
grab the mutex if called from netlink_recvmsg() only.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# 661779e1 21-Feb-2024 Ryosuke Yasuoka <ryasuoka@redhat.com>

netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter

syzbot reported the following uninit-value access issue [1]:

netlink_to_full_skb() creates a new `skb` and puts the `skb->data`
passed

netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter

syzbot reported the following uninit-value access issue [1]:

netlink_to_full_skb() creates a new `skb` and puts the `skb->data`
passed as a 1st arg of netlink_to_full_skb() onto new `skb`. The data
size is specified as `len` and passed to skb_put_data(). This `len`
is based on `skb->end` that is not data offset but buffer offset. The
`skb->end` contains data and tailroom. Since the tailroom is not
initialized when the new `skb` created, KMSAN detects uninitialized
memory area when copying the data.

This patch resolved this issue by correct the len from `skb->end` to
`skb->len`, which is the actual data offset.

BUG: KMSAN: kernel-infoleak-after-free in instrument_copy_to_user include/linux/instrumented.h:114 [inline]
BUG: KMSAN: kernel-infoleak-after-free in copy_to_user_iter lib/iov_iter.c:24 [inline]
BUG: KMSAN: kernel-infoleak-after-free in iterate_ubuf include/linux/iov_iter.h:29 [inline]
BUG: KMSAN: kernel-infoleak-after-free in iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
BUG: KMSAN: kernel-infoleak-after-free in iterate_and_advance include/linux/iov_iter.h:271 [inline]
BUG: KMSAN: kernel-infoleak-after-free in _copy_to_iter+0x364/0x2520 lib/iov_iter.c:186
instrument_copy_to_user include/linux/instrumented.h:114 [inline]
copy_to_user_iter lib/iov_iter.c:24 [inline]
iterate_ubuf include/linux/iov_iter.h:29 [inline]
iterate_and_advance2 include/linux/iov_iter.h:245 [inline]
iterate_and_advance include/linux/iov_iter.h:271 [inline]
_copy_to_iter+0x364/0x2520 lib/iov_iter.c:186
copy_to_iter include/linux/uio.h:197 [inline]
simple_copy_to_iter+0x68/0xa0 net/core/datagram.c:532
__skb_datagram_iter+0x123/0xdc0 net/core/datagram.c:420
skb_copy_datagram_iter+0x5c/0x200 net/core/datagram.c:546
skb_copy_datagram_msg include/linux/skbuff.h:3960 [inline]
packet_recvmsg+0xd9c/0x2000 net/packet/af_packet.c:3482
sock_recvmsg_nosec net/socket.c:1044 [inline]
sock_recvmsg net/socket.c:1066 [inline]
sock_read_iter+0x467/0x580 net/socket.c:1136
call_read_iter include/linux/fs.h:2014 [inline]
new_sync_read fs/read_write.c:389 [inline]
vfs_read+0x8f6/0xe00 fs/read_write.c:470
ksys_read+0x20f/0x4c0 fs/read_write.c:613
__do_sys_read fs/read_write.c:623 [inline]
__se_sys_read fs/read_write.c:621 [inline]
__x64_sys_read+0x93/0xd0 fs/read_write.c:621
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b

Uninit was stored to memory at:
skb_put_data include/linux/skbuff.h:2622 [inline]
netlink_to_full_skb net/netlink/af_netlink.c:181 [inline]
__netlink_deliver_tap_skb net/netlink/af_netlink.c:298 [inline]
__netlink_deliver_tap+0x5be/0xc90 net/netlink/af_netlink.c:325
netlink_deliver_tap net/netlink/af_netlink.c:338 [inline]
netlink_deliver_tap_kernel net/netlink/af_netlink.c:347 [inline]
netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
netlink_unicast+0x10f1/0x1250 net/netlink/af_netlink.c:1368
netlink_sendmsg+0x1238/0x13d0 net/netlink/af_netlink.c:1910
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
____sys_sendmsg+0x9c2/0xd60 net/socket.c:2584
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
__sys_sendmsg net/socket.c:2667 [inline]
__do_sys_sendmsg net/socket.c:2676 [inline]
__se_sys_sendmsg net/socket.c:2674 [inline]
__x64_sys_sendmsg+0x307/0x490 net/socket.c:2674
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b

Uninit was created at:
free_pages_prepare mm/page_alloc.c:1087 [inline]
free_unref_page_prepare+0xb0/0xa40 mm/page_alloc.c:2347
free_unref_page_list+0xeb/0x1100 mm/page_alloc.c:2533
release_pages+0x23d3/0x2410 mm/swap.c:1042
free_pages_and_swap_cache+0xd9/0xf0 mm/swap_state.c:316
tlb_batch_pages_flush mm/mmu_gather.c:98 [inline]
tlb_flush_mmu_free mm/mmu_gather.c:293 [inline]
tlb_flush_mmu+0x6f5/0x980 mm/mmu_gather.c:300
tlb_finish_mmu+0x101/0x260 mm/mmu_gather.c:392
exit_mmap+0x49e/0xd30 mm/mmap.c:3321
__mmput+0x13f/0x530 kernel/fork.c:1349
mmput+0x8a/0xa0 kernel/fork.c:1371
exit_mm+0x1b8/0x360 kernel/exit.c:567
do_exit+0xd57/0x4080 kernel/exit.c:858
do_group_exit+0x2fd/0x390 kernel/exit.c:1021
__do_sys_exit_group kernel/exit.c:1032 [inline]
__se_sys_exit_group kernel/exit.c:1030 [inline]
__x64_sys_exit_group+0x3c/0x50 kernel/exit.c:1030
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b

Bytes 3852-3903 of 3904 are uninitialized
Memory access of size 3904 starts at ffff88812ea1e000
Data copied to user address 0000000020003280

CPU: 1 PID: 5043 Comm: syz-executor297 Not tainted 6.7.0-rc5-syzkaller-00047-g5bd7ef53ffe5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023

Fixes: 1853c9496460 ("netlink, mmap: transform mmap skb into full skb on taps")
Reported-and-tested-by: syzbot+34ad5fab48f7bf510349@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=34ad5fab48f7bf510349 [1]
Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240221074053.1794118-1-ryasuoka@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 234ec0b6 22-Jan-2024 Zhengchao Shao <shaozhengchao@huawei.com>

netlink: fix potential sleeping issue in mqueue_flush_file

I analyze the potential sleeping issue of the following processes:
Thread A Thread B
...

netlink: fix potential sleeping issue in mqueue_flush_file

I analyze the potential sleeping issue of the following processes:
Thread A Thread B
... netlink_create //ref = 1
do_mq_notify ...
sock = netlink_getsockbyfilp ... //ref = 2
info->notify_sock = sock; ...
... netlink_sendmsg
... skb = netlink_alloc_large_skb //skb->head is vmalloced
... netlink_unicast
... sk = netlink_getsockbyportid //ref = 3
... netlink_sendskb
... __netlink_sendskb
... skb_queue_tail //put skb to sk_receive_queue
... sock_put //ref = 2
... ...
... netlink_release
... deferred_put_nlk_sk //ref = 1
mqueue_flush_file
spin_lock
remove_notification
netlink_sendskb
sock_put //ref = 0
sk_free
...
__sk_destruct
netlink_sock_destruct
skb_queue_purge //get skb from sk_receive_queue
...
__skb_queue_purge_reason
kfree_skb_reason
__kfree_skb
...
skb_release_all
skb_release_head_state
netlink_skb_destructor
vfree(skb->head) //sleeping while holding spinlock

In netlink_sendmsg, if the memory pointed to by skb->head is allocated by
vmalloc, and is put to sk_receive_queue queue, also the skb is not freed.
When the mqueue executes flush, the sleeping bug will occur. Use
vfree_atomic instead of vfree in netlink_skb_destructor to solve the issue.

Fixes: c05cdb1b864f ("netlink: allow large data transfers from user-space")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20240122011807.2110357-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

show more ...


# 403863e9 16-Dec-2023 Jiri Pirko <jiri@nvidia.com>

netlink: introduce typedef for filter function

Make the code using filter function a bit nicer by consolidating the
filter function arguments using typedef.

Suggested-by: Andy Shevchenko <andriy.sh

netlink: introduce typedef for filter function

Make the code using filter function a bit nicer by consolidating the
filter function arguments using typedef.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

show more ...


# ac40916a 15-Nov-2023 Li RongQing <lirongqing@baidu.com>

rtnetlink: introduce nlmsg_new_large and use it in rtnl_getlink

if a PF has 256 or more VFs, ip link command will allocate an order 3
memory or more, and maybe trigger OOM due to memory fragment,
th

rtnetlink: introduce nlmsg_new_large and use it in rtnl_getlink

if a PF has 256 or more VFs, ip link command will allocate an order 3
memory or more, and maybe trigger OOM due to memory fragment,
the VFs needed memory size is computed in rtnl_vfinfo_size.

so introduce nlmsg_new_large which calls netlink_alloc_large_skb in
which vmalloc is used for large memory, to avoid the failure of
allocating memory

ip invoked oom-killer: gfp_mask=0xc2cc0(GFP_KERNEL|__GFP_NOWARN|\
__GFP_COMP|__GFP_NOMEMALLOC), order=3, oom_score_adj=0
CPU: 74 PID: 204414 Comm: ip Kdump: loaded Tainted: P OE
Call Trace:
dump_stack+0x57/0x6a
dump_header+0x4a/0x210
oom_kill_process+0xe4/0x140
out_of_memory+0x3e8/0x790
__alloc_pages_slowpath.constprop.116+0x953/0xc50
__alloc_pages_nodemask+0x2af/0x310
kmalloc_large_node+0x38/0xf0
__kmalloc_node_track_caller+0x417/0x4d0
__kmalloc_reserve.isra.61+0x2e/0x80
__alloc_skb+0x82/0x1c0
rtnl_getlink+0x24f/0x370
rtnetlink_rcv_msg+0x12c/0x350
netlink_rcv_skb+0x50/0x100
netlink_unicast+0x1b2/0x280
netlink_sendmsg+0x355/0x4a0
sock_sendmsg+0x5b/0x60
____sys_sendmsg+0x1ea/0x250
___sys_sendmsg+0x88/0xd0
__sys_sendmsg+0x5e/0xa0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f95a65a5b70

Cc: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Link: https://lore.kernel.org/r/20231115120108.3711-1-lirongqing@baidu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# d0f95894 03-Oct-2023 Eric Dumazet <edumazet@google.com>

netlink: annotate data-races around sk->sk_err

syzbot caught another data-race in netlink when
setting sk->sk_err.

Annotate all of them for good measure.

BUG: KCSAN: data-race in netlink_recvmsg /

netlink: annotate data-races around sk->sk_err

syzbot caught another data-race in netlink when
setting sk->sk_err.

Annotate all of them for good measure.

BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg

write to 0xffff8881613bb220 of 4 bytes by task 28147 on cpu 0:
netlink_recvmsg+0x448/0x780 net/netlink/af_netlink.c:1994
sock_recvmsg_nosec net/socket.c:1027 [inline]
sock_recvmsg net/socket.c:1049 [inline]
__sys_recvfrom+0x1f4/0x2e0 net/socket.c:2229
__do_sys_recvfrom net/socket.c:2247 [inline]
__se_sys_recvfrom net/socket.c:2243 [inline]
__x64_sys_recvfrom+0x78/0x90 net/socket.c:2243
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

write to 0xffff8881613bb220 of 4 bytes by task 28146 on cpu 1:
netlink_recvmsg+0x448/0x780 net/netlink/af_netlink.c:1994
sock_recvmsg_nosec net/socket.c:1027 [inline]
sock_recvmsg net/socket.c:1049 [inline]
__sys_recvfrom+0x1f4/0x2e0 net/socket.c:2229
__do_sys_recvfrom net/socket.c:2247 [inline]
__se_sys_recvfrom net/socket.c:2243 [inline]
__x64_sys_recvfrom+0x78/0x90 net/socket.c:2243
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x00000000 -> 0x00000016

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 28146 Comm: syz-executor.0 Not tainted 6.6.0-rc3-syzkaller-00055-g9ed22ae6be81 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231003183455.3410550-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 8fe08d70 11-Aug-2023 Eric Dumazet <edumazet@google.com>

netlink: convert nlk->flags to atomic flags

sk_diag_put_flags(), netlink_setsockopt(), netlink_getsockopt()
and others use nlk->flags without correct locking.

Use set_bit(), clear_bit(), test_bit()

netlink: convert nlk->flags to atomic flags

sk_diag_put_flags(), netlink_setsockopt(), netlink_getsockopt()
and others use nlk->flags without correct locking.

Use set_bit(), clear_bit(), test_bit(), assign_bit() to remove
data-races.

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# a4c9a56e 19-Jul-2023 Anjali Kulkarni <anjali.k.kulkarni@oracle.com>

netlink: Add new netlink_release function

A new function netlink_release is added in netlink_sock to store the
protocol's release function. This is called when the socket is deleted.
This can be sup

netlink: Add new netlink_release function

A new function netlink_release is added in netlink_sock to store the
protocol's release function. This is called when the socket is deleted.
This can be supplied by the protocol via the release function in
netlink_kernel_cfg. This is being added for the NETLINK_CONNECTOR
protocol, so it can free it's data when socket is deleted.

Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# a3377386 19-Jul-2023 Anjali Kulkarni <anjali.k.kulkarni@oracle.com>

netlink: Reverse the patch which removed filtering

To use filtering at the connector & cn_proc layers, we need to enable
filtering in the netlink layer. This reverses the patch which removed
netlink

netlink: Reverse the patch which removed filtering

To use filtering at the connector & cn_proc layers, we need to enable
filtering in the netlink layer. This reverses the patch which removed
netlink filtering - commit ID for that patch:
549017aa1bb7 (netlink: remove netlink_broadcast_filtered).

Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# b8e39b38 10-Jul-2023 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

netlink: Make use of __assign_bit() API

We have for some time the __assign_bit() API to replace open coded

if (foo)
__set_bit(n, bar);
else
__clear_bit(n, bar);

Use this API in the code. No

netlink: Make use of __assign_bit() API

We have for some time the __assign_bit() API to replace open coded

if (foo)
__set_bit(n, bar);
else
__clear_bit(n, bar);

Use this API in the code. No functional change intended.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Message-ID: <20230710100830.89936-2-andriy.shevchenko@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

show more ...


# dc97391e 23-Jun-2023 David Howells <dhowells@redhat.com>

sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)

Remove ->sendpage() and ->sendpage_locked(). sendmsg() with
MSG_SPLICE_PAGES should be used instead. This allows multiple pages an

sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)

Remove ->sendpage() and ->sendpage_locked(). sendmsg() with
MSG_SPLICE_PAGES should be used instead. This allows multiple pages and
multipage folios to be passed through.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-afs@lists.infradead.org
cc: mptcp@lists.linux.dev
cc: rds-devel@oss.oracle.com
cc: tipc-discussion@lists.sourceforge.net
cc: virtualization@lists.linux-foundation.org
Link: https://lore.kernel.org/r/20230623225513.2732256-16-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 8d61f926 21-Jun-2023 Eric Dumazet <edumazet@google.com>

netlink: fix potential deadlock in netlink_set_err()

syzbot reported a possible deadlock in netlink_set_err() [1]

A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs
for netlin

netlink: fix potential deadlock in netlink_set_err()

syzbot reported a possible deadlock in netlink_set_err() [1]

A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs
for netlink_lock_table()") in netlink_lock_table()

This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump()
which were not covered by cited commit.

[1]

WARNING: possible irq lock inversion dependency detected
6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted

syz-executor.2/23011 just changed the state of lock:
ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612
but this lock was taken by another, SOFTIRQ-safe lock in the past:
(&local->queue_stop_reason_lock){..-.}-{2:2}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
Possible interrupt unsafe locking scenario:

CPU0 CPU1
---- ----
lock(nl_table_lock);
local_irq_disable();
lock(&local->queue_stop_reason_lock);
lock(nl_table_lock);
<Interrupt>
lock(&local->queue_stop_reason_lock);

*** DEADLOCK ***

Fixes: 1d482e666b8e ("netlink: disable IRQs for netlink_lock_table()")
Reported-by: syzbot+a7d200a347f912723e5c@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=a7d200a347f912723e5c
Link: https://lore.kernel.org/netdev/000000000000e38d1605fea5747e@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20230621154337.1668594-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 5ab8c41c 09-Jun-2023 Jakub Kicinski <kuba@kernel.org>

netlink: support extack in dump ->start()

Commit 4a19edb60d02 ("netlink: Pass extack to dump handlers")
added extack support to netlink dumps. It was focused on rtnl
and since rtnl does not use ->st

netlink: support extack in dump ->start()

Commit 4a19edb60d02 ("netlink: Pass extack to dump handlers")
added extack support to netlink dumps. It was focused on rtnl
and since rtnl does not use ->start(), ->done() callbacks
it ignored those. Genetlink on the other hand uses ->start()
extensively, for parsing and input validation.

Pass the extact in via struct netlink_dump_control and link
it to cb for the time of ->start(). Both struct netlink_dump_control
and extack itself live on the stack so we can't keep the same
extack for the duration of the dump. This means that the extack
visible in ->start() and each ->dump() callbacks will be different.
Corner cases like reporting a warning message in DONE across dump
calls are still not supported.

We could put the extack (for dumps) in the socket struct,
but layering makes it slightly awkward (extack pointer is decided
before the DO / DUMP split).

The genetlink dump error extacks are now surfaced:

$ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get
lib.ynl.NlError: Netlink error: Invalid argument
nl_len = 64 (48) nl_flags = 0x300 nl_type = 2
error: -22 extack: {'msg': 'request header missing'}

Previously extack was missing:

$ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get
lib.ynl.NlError: Netlink error: Invalid argument
nl_len = 36 (20) nl_flags = 0x100 nl_type = 2
error: -22

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# f4e45348 29-May-2023 Pedro Tammela <pctammela@mojatatu.com>

net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report

The current code for the length calculation wrongly truncates the reported
length of the groups array, causing an under report of the subscrib

net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report

The current code for the length calculation wrongly truncates the reported
length of the groups array, causing an under report of the subscribed
groups. To fix this, use 'BITS_TO_BYTES()' which rounds up the
division by 8.

Fixes: b42be38b2778 ("netlink: add API to retrieve all group memberships")
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230529153335.389815-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# a939d149 09-May-2023 Eric Dumazet <edumazet@google.com>

netlink: annotate accesses to nlk->cb_running

Both netlink_recvmsg() and netlink_native_seq_show() read
nlk->cb_running locklessly. Use READ_ONCE() there.

Add corresponding WRITE_ONCE() to netlink_

netlink: annotate accesses to nlk->cb_running

Both netlink_recvmsg() and netlink_native_seq_show() read
nlk->cb_running locklessly. Use READ_ONCE() there.

Add corresponding WRITE_ONCE() to netlink_dump() and
__netlink_dump_start()

syzbot reported:
BUG: KCSAN: data-race in __netlink_dump_start / netlink_recvmsg

write to 0xffff88813ea4db59 of 1 bytes by task 28219 on cpu 0:
__netlink_dump_start+0x3af/0x4d0 net/netlink/af_netlink.c:2399
netlink_dump_start include/linux/netlink.h:308 [inline]
rtnetlink_rcv_msg+0x70f/0x8c0 net/core/rtnetlink.c:6130
netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2577
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6192
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
sock_write_iter+0x1aa/0x230 net/socket.c:1138
call_write_iter include/linux/fs.h:1851 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x463/0x760 fs/read_write.c:584
ksys_write+0xeb/0x1a0 fs/read_write.c:637
__do_sys_write fs/read_write.c:649 [inline]
__se_sys_write fs/read_write.c:646 [inline]
__x64_sys_write+0x42/0x50 fs/read_write.c:646
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

read to 0xffff88813ea4db59 of 1 bytes by task 28222 on cpu 1:
netlink_recvmsg+0x3b4/0x730 net/netlink/af_netlink.c:2022
sock_recvmsg_nosec+0x4c/0x80 net/socket.c:1017
____sys_recvmsg+0x2db/0x310 net/socket.c:2718
___sys_recvmsg net/socket.c:2762 [inline]
do_recvmmsg+0x2e5/0x710 net/socket.c:2856
__sys_recvmmsg net/socket.c:2935 [inline]
__do_sys_recvmmsg net/socket.c:2958 [inline]
__se_sys_recvmmsg net/socket.c:2951 [inline]
__x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x00 -> 0x01

Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# d913d32c 21-Apr-2023 Kuniyuki Iwashima <kuniyu@amazon.com>

netlink: Use copy_to_user() for optval in netlink_getsockopt().

Brad Spencer provided a detailed report [0] that when calling getsockopt()
for AF_NETLINK, some SOL_NETLINK options set only 1 byte ev

netlink: Use copy_to_user() for optval in netlink_getsockopt().

Brad Spencer provided a detailed report [0] that when calling getsockopt()
for AF_NETLINK, some SOL_NETLINK options set only 1 byte even though such
options require at least sizeof(int) as length.

The options return a flag value that fits into 1 byte, but such behaviour
confuses users who do not initialise the variable before calling
getsockopt() and do not strictly check the returned value as char.

Currently, netlink_getsockopt() uses put_user() to copy data to optlen and
optval, but put_user() casts the data based on the pointer, char *optval.
As a result, only 1 byte is set to optval.

To avoid this behaviour, we need to use copy_to_user() or cast optval for
put_user().

Note that this changes the behaviour on big-endian systems, but we document
that the size of optval is int in the man page.

$ man 7 netlink
...
Socket options
To set or get a netlink socket option, call getsockopt(2) to read
or setsockopt(2) to write the option with the option level argument
set to SOL_NETLINK. Unless otherwise noted, optval is a pointer to
an int.

Fixes: 9a4595bc7e67 ("[NETLINK]: Add set/getsockopt options to support more than 32 groups")
Fixes: be0c22a46cfb ("netlink: add NETLINK_BROADCAST_ERROR socket option")
Fixes: 38938bfe3489 ("netlink: add NETLINK_NO_ENOBUFS socket flag")
Fixes: 0a6a3a23ea6e ("netlink: add NETLINK_CAP_ACK socket option")
Fixes: 2d4bc93368f5 ("netlink: extended ACK reporting")
Fixes: 89d35528d17d ("netlink: Add new socket option to enable strict checking on dumps")
Reported-by: Brad Spencer <bspencer@blackberry.com>
Link: https://lore.kernel.org/netdev/ZD7VkNWFfp22kTDt@datsun.rim.net/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Link: https://lore.kernel.org/r/20230421185255.94606-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# a1865f2e 03-Apr-2023 Eric Dumazet <edumazet@google.com>

netlink: annotate lockless accesses to nlk->max_recvmsg_len

syzbot reported a data-race in data-race in netlink_recvmsg() [1]

Indeed, netlink_recvmsg() can be run concurrently,
and netlink_dump() a

netlink: annotate lockless accesses to nlk->max_recvmsg_len

syzbot reported a data-race in data-race in netlink_recvmsg() [1]

Indeed, netlink_recvmsg() can be run concurrently,
and netlink_dump() also needs protection.

[1]
BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg

read to 0xffff888141840b38 of 8 bytes by task 23057 on cpu 0:
netlink_recvmsg+0xea/0x730 net/netlink/af_netlink.c:1988
sock_recvmsg_nosec net/socket.c:1017 [inline]
sock_recvmsg net/socket.c:1038 [inline]
__sys_recvfrom+0x1ee/0x2e0 net/socket.c:2194
__do_sys_recvfrom net/socket.c:2212 [inline]
__se_sys_recvfrom net/socket.c:2208 [inline]
__x64_sys_recvfrom+0x78/0x90 net/socket.c:2208
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

write to 0xffff888141840b38 of 8 bytes by task 23037 on cpu 1:
netlink_recvmsg+0x114/0x730 net/netlink/af_netlink.c:1989
sock_recvmsg_nosec net/socket.c:1017 [inline]
sock_recvmsg net/socket.c:1038 [inline]
____sys_recvmsg+0x156/0x310 net/socket.c:2720
___sys_recvmsg net/socket.c:2762 [inline]
do_recvmmsg+0x2e5/0x710 net/socket.c:2856
__sys_recvmmsg net/socket.c:2935 [inline]
__do_sys_recvmmsg net/socket.c:2958 [inline]
__se_sys_recvmmsg net/socket.c:2951 [inline]
__x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x0000000000000000 -> 0x0000000000001000

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 23037 Comm: syz-executor.2 Not tainted 6.3.0-rc4-syzkaller-00195-g5a57b48fdfcb #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023

Fixes: 9063e21fb026 ("netlink: autosize skb lengthes")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230403214643.768555-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


12345678910>>...18