#
755662ce |
| 24-Nov-2021 |
Kuniyuki Iwashima <kuniyu@amazon.co.jp> |
af_unix: Use offsetof() instead of sizeof().
The length of the AF_UNIX socket address contains an offset to the member sun_path of struct sockaddr_un.
Currently, the preceding member is just sun_fa
af_unix: Use offsetof() instead of sizeof().
The length of the AF_UNIX socket address contains an offset to the member sun_path of struct sockaddr_un.
Currently, the preceding member is just sun_family, and its type is sa_family_t and resolved to short. Therefore, the offset is represented by sizeof(short). However, it is not clear and fragile to changes in struct sockaddr_storage or sockaddr_un.
This commit makes it clear and robust by rewriting sizeof() with offsetof().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
f9390b24 |
| 19-Nov-2021 |
Vincent Whitchurch <vincent.whitchurch@axis.com> |
af_unix: fix regression in read after shutdown
On kernels before v5.15, calling read() on a unix socket after shutdown(SHUT_RD) or shutdown(SHUT_RDWR) would return the data previously written or EOF
af_unix: fix regression in read after shutdown
On kernels before v5.15, calling read() on a unix socket after shutdown(SHUT_RD) or shutdown(SHUT_RDWR) would return the data previously written or EOF. But now, while read() after shutdown(SHUT_RD) still behaves the same way, read() after shutdown(SHUT_RDWR) always fails with -EINVAL.
This behaviour change was apparently inadvertently introduced as part of a bug fix for a different regression caused by the commit adding sockmap support to af_unix, commit 94531cfcbe79c359 ("af_unix: Add unix_stream_proto for sockmap"). Those commits, for unclear reasons, started setting the socket state to TCP_CLOSE on shutdown(SHUT_RDWR), while this state change had previously only been done in unix_release_sock().
Restore the original behaviour. The sockmap tests in tests/selftests/bpf continue to pass after this patch.
Fixes: d0c6416bd7091647f60 ("unix: Fix an issue in unix_shutdown causing the other end read/write failures") Link: https://lore.kernel.org/lkml/20211111140000.GA10779@axis.com/ Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
b3cb764a |
| 15-Nov-2021 |
Eric Dumazet <edumazet@google.com> |
net: drop nopreempt requirement on sock_prot_inuse_add()
This is distracting really, let's make this simpler, because many callers had to take care of this by themselves, even if on x86 this adds mo
net: drop nopreempt requirement on sock_prot_inuse_add()
This is distracting really, let's make this simpler, because many callers had to take care of this by themselves, even if on x86 this adds more code than really needed.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
af493388 |
| 08-Oct-2021 |
Cong Wang <cong.wang@bytedance.com> |
net: Implement ->sock_is_readable() for UDP and AF_UNIX
Yucong noticed we can't poll() sockets in sockmap even when they are the destination sockets of redirections. This is because we never poll an
net: Implement ->sock_is_readable() for UDP and AF_UNIX
Yucong noticed we can't poll() sockets in sockmap even when they are the destination sockets of redirections. This is because we never poll any psock queues in ->poll(), except for TCP. With ->sock_is_readable() now we can overwrite >sock_is_readable(), invoke and implement it for both UDP and AF_UNIX sockets.
Reported-by: Yucong Sun <sunyucong@gmail.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211008203306.37525-4-xiyou.wangcong@gmail.com
show more ...
|
#
0edf0824 |
| 08-Oct-2021 |
Stephen Boyd <swboyd@chromium.org> |
af_unix: Rename UNIX-DGRAM to UNIX to maintain backwards compatability
Then name of this protocol changed in commit 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") because that commit ad
af_unix: Rename UNIX-DGRAM to UNIX to maintain backwards compatability
Then name of this protocol changed in commit 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") because that commit added stream support to the af_unix protocol. Renaming the existing protocol makes a ChromeOS protocol test[1] fail now that the name has changed in /proc/net/protocols from "UNIX" to "UNIX-DGRAM".
Let's put the name back to how it was while keeping the stream protocol as "UNIX-STREAM" so that the procfs interface doesn't change. This fixes the test and maintains backwards compatibility in proc.
Cc: Jiang Wang <jiang.wang@bytedance.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Cong Wang <cong.wang@bytedance.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Dmitry Osipenko <digetx@gmail.com> Link: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/supported_protocols.go;l=50;drc=e8b1c3f94cb40a054f4aa1ef1aff61e75dc38f18 [1] Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
d0c6416b |
| 04-Oct-2021 |
Jiang Wang <jiang.wang@bytedance.com> |
unix: Fix an issue in unix_shutdown causing the other end read/write failures
Commit 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") sets unix domain socket peer state to TCP_CLOSE in un
unix: Fix an issue in unix_shutdown causing the other end read/write failures
Commit 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") sets unix domain socket peer state to TCP_CLOSE in unix_shutdown. This could happen when the local end is shutdown but the other end is not. Then, the other end will get read or write failures which is not expected. Fix the issue by setting the local state to shutdown.
Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") Reported-by: Casey Schaufler <casey@schaufler-ca.com> Suggested-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211004232530.2377085-1-jiang.wang@bytedance.com
show more ...
|
#
35306eb2 |
| 29-Sep-2021 |
Eric Dumazet <edumazet@google.com> |
af_unix: fix races in sk_peer_pid and sk_peer_cred accesses
Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations are racy, as af_unix can concurrently change sk_peer_pid and sk_peer
af_unix: fix races in sk_peer_pid and sk_peer_cred accesses
Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred.
In order to fix this issue, this patch adds a new spinlock that needs to be used whenever these fields are read or written.
Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently reading sk->sk_peer_pid which makes no sense, as this field is only possibly set by AF_UNIX sockets. We will have to clean this in a separate patch. This could be done by reverting b48596d1dc25 "Bluetooth: L2CAP: Add get_peer_pid callback" or implementing what was truly expected.
Fixes: 109f6e39fa07 ("af_unix: Allow SO_PEERCRED to work across namespaces.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jann Horn <jannh@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
f4bd73b5 |
| 28-Sep-2021 |
Kuniyuki Iwashima <kuniyu@amazon.co.jp> |
af_unix: Return errno instead of NULL in unix_create1().
unix_create1() returns NULL on error, and the callers assume that it never fails for reasons other than out of memory. So, the callers alway
af_unix: Return errno instead of NULL in unix_create1().
unix_create1() returns NULL on error, and the callers assume that it never fails for reasons other than out of memory. So, the callers always return -ENOMEM when unix_create1() fails.
However, it also returns NULL when the number of af_unix sockets exceeds twice the limit controlled by sysctl: fs.file-max. In this case, the callers should return -ENFILE like alloc_empty_file().
This patch changes unix_create1() to return the correct error value instead of NULL on error.
Out of curiosity, the assumption has been wrong since 1999 due to this change introduced in 2.2.4 [0].
diff -u --recursive --new-file v2.2.3/linux/net/unix/af_unix.c linux/net/unix/af_unix.c --- v2.2.3/linux/net/unix/af_unix.c Tue Jan 19 11:32:53 1999 +++ linux/net/unix/af_unix.c Sun Mar 21 07:22:00 1999 @@ -388,6 +413,9 @@ { struct sock *sk;
+ if (atomic_read(&unix_nr_socks) >= 2*max_files) + return NULL; + MOD_INC_USE_COUNT; sk = sk_alloc(PF_UNIX, GFP_KERNEL, 1); if (!sk) {
[0]: https://cdn.kernel.org/pub/linux/kernel/v2.2/patch-2.2.4.gz
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
04f08eb4 |
| 09-Sep-2021 |
Eric Dumazet <edumazet@google.com> |
net/af_unix: fix a data-race in unix_dgram_poll
syzbot reported another data-race in af_unix [1]
Lets change __skb_insert() to use WRITE_ONCE() when changing skb head qlen.
Also, change unix_dgram
net/af_unix: fix a data-race in unix_dgram_poll
syzbot reported another data-race in af_unix [1]
Lets change __skb_insert() to use WRITE_ONCE() when changing skb head qlen.
Also, change unix_dgram_poll() to use lockless version of unix_recvq_full()
It is verry possible we can switch all/most unix_recvq_full() to the lockless version, this will be done in a future kernel version.
[1] HEAD commit: 8596e589b787732c8346f0482919e83cc9362db1
BUG: KCSAN: data-race in skb_queue_tail / unix_dgram_poll
write to 0xffff88814eeb24e0 of 4 bytes by task 25815 on cpu 0: __skb_insert include/linux/skbuff.h:1938 [inline] __skb_queue_before include/linux/skbuff.h:2043 [inline] __skb_queue_tail include/linux/skbuff.h:2076 [inline] skb_queue_tail+0x80/0xa0 net/core/skbuff.c:3264 unix_dgram_sendmsg+0xff2/0x1600 net/unix/af_unix.c:1850 sock_sendmsg_nosec net/socket.c:703 [inline] sock_sendmsg net/socket.c:723 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2392 ___sys_sendmsg net/socket.c:2446 [inline] __sys_sendmmsg+0x315/0x4b0 net/socket.c:2532 __do_sys_sendmmsg net/socket.c:2561 [inline] __se_sys_sendmmsg net/socket.c:2558 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2558 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff88814eeb24e0 of 4 bytes by task 25834 on cpu 1: skb_queue_len include/linux/skbuff.h:1869 [inline] unix_recvq_full net/unix/af_unix.c:194 [inline] unix_dgram_poll+0x2bc/0x3e0 net/unix/af_unix.c:2777 sock_poll+0x23e/0x260 net/socket.c:1288 vfs_poll include/linux/poll.h:90 [inline] ep_item_poll fs/eventpoll.c:846 [inline] ep_send_events fs/eventpoll.c:1683 [inline] ep_poll fs/eventpoll.c:1798 [inline] do_epoll_wait+0x6ad/0xf00 fs/eventpoll.c:2226 __do_sys_epoll_wait fs/eventpoll.c:2238 [inline] __se_sys_epoll_wait fs/eventpoll.c:2233 [inline] __x64_sys_epoll_wait+0xf6/0x120 fs/eventpoll.c:2233 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x0000001b -> 0x00000001
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 25834 Comm: syz-executor.1 Tainted: G W 5.14.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 86b18aaa2b5b ("skbuff: fix a data race in skb_queue_len()") Cc: Qian Cai <cai@lca.pw> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
dc56ad70 |
| 30-Aug-2021 |
Eric Dumazet <edumazet@google.com> |
af_unix: fix potential NULL deref in unix_dgram_connect()
syzbot was able to trigger NULL deref in unix_dgram_connect() [1]
This happens in
if (unix_peer(sk)) sk->sk_state = other->sk_state = T
af_unix: fix potential NULL deref in unix_dgram_connect()
syzbot was able to trigger NULL deref in unix_dgram_connect() [1]
This happens in
if (unix_peer(sk)) sk->sk_state = other->sk_state = TCP_ESTABLISHED; // crash because @other is NULL
Because locks have been dropped, unix_peer() might be non NULL, while @other is NULL (AF_UNSPEC case)
We need to move code around, so that we no longer access unix_peer() and sk_state while locks have been released.
[1] general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] CPU: 0 PID: 10341 Comm: syz-executor239 Not tainted 5.14.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:unix_dgram_connect+0x32a/0xc60 net/unix/af_unix.c:1226 Code: 00 00 45 31 ed 49 83 bc 24 f8 05 00 00 00 74 69 e8 eb 5b a6 f9 48 8d 7d 12 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 e0 07 00 00 RSP: 0018:ffffc9000a89fcd8 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000004 RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffffffff87cf4ef5 RDI: 0000000000000012 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88802e1917c3 R10: ffffffff87cf4eba R11: 0000000000000001 R12: ffff88802e191740 R13: 0000000000000000 R14: ffff88802e191d38 R15: ffff88802e1917c0 FS: 00007f3eb0052700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004787d0 CR3: 0000000029c0a000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __sys_connect_file+0x155/0x1a0 net/socket.c:1890 __sys_connect+0x161/0x190 net/socket.c:1907 __do_sys_connect net/socket.c:1917 [inline] __se_sys_connect net/socket.c:1914 [inline] __x64_sys_connect+0x6f/0xb0 net/socket.c:1914 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x446a89 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 a1 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3eb0052208 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 00000000004cc4d8 RCX: 0000000000446a89 RDX: 000000000000006e RSI: 0000000020000180 RDI: 0000000000000003 RBP: 00000000004cc4d0 R08: 00007f3eb0052700 R09: 0000000000000000 R10: 00007f3eb0052700 R11: 0000000000000246 R12: 00000000004cc4dc R13: 00007ffd791e79cf R14: 00007f3eb0052300 R15: 0000000000022000 Modules linked in: ---[ end trace 4eb809357514968c ]--- RIP: 0010:unix_dgram_connect+0x32a/0xc60 net/unix/af_unix.c:1226 Code: 00 00 45 31 ed 49 83 bc 24 f8 05 00 00 00 74 69 e8 eb 5b a6 f9 48 8d 7d 12 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 e0 07 00 00 RSP: 0018:ffffc9000a89fcd8 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000004 RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffffffff87cf4ef5 RDI: 0000000000000012 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88802e1917c3 R10: ffffffff87cf4eba R11: 0000000000000001 R12: ffff88802e191740 R13: 0000000000000000 R14: ffff88802e191d38 R15: ffff88802e1917c0 FS: 00007f3eb0052700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd791fe960 CR3: 0000000029c0a000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <cong.wang@bytedance.com> Cc: Alexei Starovoitov <ast@kernel.org> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
d359902d |
| 21-Aug-2021 |
Jiang Wang <jiang.wang@bytedance.com> |
af_unix: Fix NULL pointer bug in unix_shutdown
Commit 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") introduced a bug for af_unix SEQPACKET type. In unix_shutdown, the unhash function w
af_unix: Fix NULL pointer bug in unix_shutdown
Commit 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") introduced a bug for af_unix SEQPACKET type. In unix_shutdown, the unhash function will call prot->unhash(), which is NULL for SEQPACKET. And kernel will panic. On ARM32, it will show following messages: (it likely affects x86 too).
Fix the bug by checking the prot->unhash is NULL or not first.
Kernel log: <--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = 2fba1ffb *pgd=00000000 Internal error: Oops: 80000005 [#1] PREEMPT SMP THUMB2 Modules linked in: CPU: 1 PID: 1999 Comm: falkon Tainted: G W 5.14.0-rc5-01175-g94531cfcbe79-dirty #9240 Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) PC is at 0x0 LR is at unix_shutdown+0x81/0x1a8 pc : [<00000000>] lr : [<c08f3311>] psr: 600f0013 sp : e45aff70 ip : e463a3c0 fp : beb54f04 r10: 00000125 r9 : e45ae000 r8 : c4a56664 r7 : 00000001 r6 : c4a56464 r5 : 00000001 r4 : c4a56400 r3 : 00000000 r2 : c5a6b180 r1 : 00000000 r0 : c4a56400 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 50c5387d Table: 05aa804a DAC: 00000051 Register r0 information: slab PING start c4a56400 pointer offset 0 Register r1 information: NULL pointer Register r2 information: slab task_struct start c5a6b180 pointer offset 0 Register r3 information: NULL pointer Register r4 information: slab PING start c4a56400 pointer offset 0 Register r5 information: non-paged memory Register r6 information: slab PING start c4a56400 pointer offset 100 Register r7 information: non-paged memory Register r8 information: slab PING start c4a56400 pointer offset 612 Register r9 information: non-slab/vmalloc memory Register r10 information: non-paged memory Register r11 information: non-paged memory Register r12 information: slab filp start e463a3c0 pointer offset 0 Process falkon (pid: 1999, stack limit = 0x9ec48895) Stack: (0xe45aff70 to 0xe45b0000) ff60: e45ae000 c5f26a00 00000000 00000125 ff80: c0100264 c07f7fa3 beb54f04 fffffff7 00000001 e6f3fc0e b5e5e9ec beb54ec4 ffa0: b5da0ccc c010024b b5e5e9ec beb54ec4 0000000f 00000000 00000000 beb54ebc ffc0: b5e5e9ec beb54ec4 b5da0ccc 00000125 beb54f58 00785238 beb5529c beb54f04 ffe0: b5da1e24 beb54eac b301385c b62b6ee8 600f0030 0000000f 00000000 00000000 [<c08f3311>] (unix_shutdown) from [<c07f7fa3>] (__sys_shutdown+0x2f/0x50) [<c07f7fa3>] (__sys_shutdown) from [<c010024b>] (__sys_trace_return+0x1/0x16) Exception stack(0xe45affa8 to 0xe45afff0)
Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") Reported-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Dmitry Osipenko <digetx@gmail.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Link: https://lore.kernel.org/bpf/20210821180738.1151155-1-jiang.wang@bytedance.com
show more ...
|
#
94531cfc |
| 16-Aug-2021 |
Jiang Wang <jiang.wang@bytedance.com> |
af_unix: Add unix_stream_proto for sockmap
Previously, sockmap for AF_UNIX protocol only supports dgram type. This patch add unix stream type support, which is similar to unix_dgram_proto. To suppor
af_unix: Add unix_stream_proto for sockmap
Previously, sockmap for AF_UNIX protocol only supports dgram type. This patch add unix stream type support, which is similar to unix_dgram_proto. To support sockmap, dgram and stream cannot share the same unix_proto anymore, because they have different implementations, such as unhash for stream type (which will remove closed or disconnected sockets from the map), so rename unix_proto to unix_dgram_proto and add a new unix_stream_proto.
Also implement stream related sockmap functions. And add dgram key words to those dgram specific functions.
Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-3-jiang.wang@bytedance.com
show more ...
|
#
77462de1 |
| 16-Aug-2021 |
Jiang Wang <jiang.wang@bytedance.com> |
af_unix: Add read_sock for stream socket types
To support sockmap for af_unix stream type, implement read_sock, which is similar to the read_sock for unix dgram sockets.
Signed-off-by: Jiang Wang <
af_unix: Add read_sock for stream socket types
To support sockmap for af_unix stream type, implement read_sock, which is similar to the read_sock for unix dgram sockets.
Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-2-jiang.wang@bytedance.com
show more ...
|
#
19eed721 |
| 13-Aug-2021 |
Rao Shoaib <Rao.Shoaib@oracle.com> |
af_unix: check socket state when queuing OOB
edumazet@google.com pointed out that queue_oob does not check socket state after acquiring the lock. He also pointed to an incorrect usage of kfree_skb a
af_unix: check socket state when queuing OOB
edumazet@google.com pointed out that queue_oob does not check socket state after acquiring the lock. He also pointed to an incorrect usage of kfree_skb and an unnecessary setting of skb length. This patch addresses those issue.
Signed-off-by: Rao Shoaib <Rao.Shoaib@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
2c860a43 |
| 14-Aug-2021 |
Kuniyuki Iwashima <kuniyu@amazon.co.jp> |
bpf: af_unix: Implement BPF iterator for UNIX domain socket.
This patch implements the BPF iterator for the UNIX domain socket.
Currently, the batch optimisation introduced for the TCP iterator in
bpf: af_unix: Implement BPF iterator for UNIX domain socket.
This patch implements the BPF iterator for the UNIX domain socket.
Currently, the batch optimisation introduced for the TCP iterator in the commit 04c7820b776f ("bpf: tcp: Bpf iter batching and lock_sock") is not used for the UNIX domain socket. It will require replacing the big lock for the hash table with small locks for each hash list not to block other processes.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210814015718.42704-2-kuniyu@amazon.co.jp
show more ...
|
#
876c14ad |
| 11-Aug-2021 |
Rao Shoaib <rao.shoaib@oracle.com> |
af_unix: fix holding spinlock in oob handling
syzkaller found that OOB code was holding spinlock while calling a function in which it could sleep.
Reported-by: syzbot+8760ca6c1ee783ac4abd@syzkaller
af_unix: fix holding spinlock in oob handling
syzkaller found that OOB code was holding spinlock while calling a function in which it could sleep.
Reported-by: syzbot+8760ca6c1ee783ac4abd@syzkaller.appspotmail.com Fixes: 314001f0bf92 ("af_unix: Add OOB support") Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com> Link: https://lore.kernel.org/r/20210811220652.567434-1-Rao.Shoaib@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
314001f0 |
| 01-Aug-2021 |
Rao Shoaib <rao.shoaib@oracle.com> |
af_unix: Add OOB support
This patch adds OOB support for AF_UNIX sockets. The semantics is same as TCP.
The last byte of a message with the OOB flag is treated as the OOB byte. The byte is separate
af_unix: Add OOB support
This patch adds OOB support for AF_UNIX sockets. The semantics is same as TCP.
The last byte of a message with the OOB flag is treated as the OOB byte. The byte is separated into a skb and a pointer to the skb is stored in unix_sock. The pointer is used to enforce OOB semantics.
Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
cbcf0112 |
| 28-Jul-2021 |
Miklos Szeredi <mszeredi@redhat.com> |
af_unix: fix garbage collect vs MSG_PEEK
unix_gc() assumes that candidate sockets can never gain an external reference (i.e. be installed into an fd) while the unix_gc_lock is held. Except for MSG
af_unix: fix garbage collect vs MSG_PEEK
unix_gc() assumes that candidate sockets can never gain an external reference (i.e. be installed into an fd) while the unix_gc_lock is held. Except for MSG_PEEK this is guaranteed by modifying inflight count under the unix_gc_lock.
MSG_PEEK does not touch any variable protected by unix_gc_lock (file count is not), yet it needs to be serialized with garbage collection. Do this by locking/unlocking unix_gc_lock:
1) increment file count
2) lock/unlock barrier to make sure incremented file count is visible to garbage collection
3) install file into fd
This is a lock barrier (unlike smp_mb()) that ensures that garbage collection is run completely before or completely after the barrier.
Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
9825d866 |
| 04-Jul-2021 |
Cong Wang <cong.wang@bytedance.com> |
af_unix: Implement unix_dgram_bpf_recvmsg()
We have to implement unix_dgram_bpf_recvmsg() to replace the original ->recvmsg() to retrieve skmsg from ingress_msg.
AF_UNIX is again special here becau
af_unix: Implement unix_dgram_bpf_recvmsg()
We have to implement unix_dgram_bpf_recvmsg() to replace the original ->recvmsg() to retrieve skmsg from ingress_msg.
AF_UNIX is again special here because the lack of sk_prot->recvmsg(). I simply add a special case inside unix_dgram_recvmsg() to call sk->sk_prot->recvmsg() directly.
Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-8-xiyou.wangcong@gmail.com
show more ...
|
#
c6382918 |
| 04-Jul-2021 |
Cong Wang <cong.wang@bytedance.com> |
af_unix: Implement ->psock_update_sk_prot()
Now we can implement unix_bpf_update_proto() to update sk_prot, especially prot->close().
Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-b
af_unix: Implement ->psock_update_sk_prot()
Now we can implement unix_bpf_update_proto() to update sk_prot, especially prot->close().
Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-7-xiyou.wangcong@gmail.com
show more ...
|
#
c7272e15 |
| 04-Jul-2021 |
Cong Wang <cong.wang@bytedance.com> |
af_unix: Add a dummy ->close() for sockmap
Unlike af_inet, unix_proto is very different, it does not even have a ->close(). We have to add a dummy implementation to satisfy sockmap. Normally it is j
af_unix: Add a dummy ->close() for sockmap
Unlike af_inet, unix_proto is very different, it does not even have a ->close(). We have to add a dummy implementation to satisfy sockmap. Normally it is just a nop, it is introduced only for sockmap to replace it.
Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-6-xiyou.wangcong@gmail.com
show more ...
|
#
83301b53 |
| 04-Jul-2021 |
Cong Wang <cong.wang@bytedance.com> |
af_unix: Set TCP_ESTABLISHED for datagram sockets too
Currently only unix stream socket sets TCP_ESTABLISHED, datagram socket can set this too when they connect to its peer socket. At least __ip4_da
af_unix: Set TCP_ESTABLISHED for datagram sockets too
Currently only unix stream socket sets TCP_ESTABLISHED, datagram socket can set this too when they connect to its peer socket. At least __ip4_datagram_connect() does the same.
This will be used to determine whether an AF_UNIX datagram socket can be redirected to in sockmap.
Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-5-xiyou.wangcong@gmail.com
show more ...
|
#
29df44fa |
| 04-Jul-2021 |
Cong Wang <cong.wang@bytedance.com> |
af_unix: Implement ->read_sock() for sockmap
Implement ->read_sock() for AF_UNIX datagram socket, it is pretty much similar to udp_read_sock().
Signed-off-by: Cong Wang <cong.wang@bytedance.com> Si
af_unix: Implement ->read_sock() for sockmap
Implement ->read_sock() for AF_UNIX datagram socket, it is pretty much similar to udp_read_sock().
Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-4-xiyou.wangcong@gmail.com
show more ...
|
#
e3ae2365 |
| 27-Jun-2021 |
Alexander Aring <aahringo@redhat.com> |
net: sock: introduce sk_error_report
This patch introduces a function wrapper to call the sk_error_report callback. That will prepare to add additional handling whenever sk_error_report is called, f
net: sock: introduce sk_error_report
This patch introduces a function wrapper to call the sk_error_report callback. That will prepare to add additional handling whenever sk_error_report is called, for example to trace socket errors.
Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
be752283 |
| 19-Jun-2021 |
Al Viro <viro@zeniv.linux.org.uk> |
__unix_find_socket_byname(): don't pass hash and type separately
We only care about exclusive or of those, so pass that directly. Makes life simpler for callers as well...
Signed-off-by: Al Viro <v
__unix_find_socket_byname(): don't pass hash and type separately
We only care about exclusive or of those, so pass that directly. Makes life simpler for callers as well...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|