#
bb0cd11a |
| 03-May-2024 |
mvs <mvs@openbsd.org> |
Push solock() down to sosend() and remove it from soreceive() paths fro unix(4) sockets.
Push solock() deep down to sosend() and remove it from soreceive() paths for unix(4) sockets.
The transmissi
Push solock() down to sosend() and remove it from soreceive() paths fro unix(4) sockets.
Push solock() deep down to sosend() and remove it from soreceive() paths for unix(4) sockets.
The transmission of unix(4) sockets already half-unlocked because connected peer is not locked by solock() during sbappend*() call. Use `sb_mtx' mutex(9) and `sb_lock' rwlock(9) to protect both `so_snd' and `so_rcv'.
Since the `so_snd' is protected by `sb_mtx' mutex(9) the re-locking is not required in uipc_rcvd().
Do direct `so_rcv' dispose and cleanup in sofree(). This sockets is almost dead and unlinked from everywhere include spliced peer, so concurrent sotask() thread will just exit. This required to keep locks order between `i_lock' and `sb_lock'. Also this removes re-locking from sofree() for all sockets.
SB_OWNLOCK became redundant with SB_MTXLOCK, so remove it. SB_MTXLOCK was kept because checks against SB_MTXLOCK within sb*() routines are mor consistent.
Feedback and ok bluhm
show more ...
|
#
1baa51d9 |
| 11-Apr-2024 |
mvs <mvs@openbsd.org> |
Don't take solock() in soreceive() for SOCK_RAW inet sockets.
For inet sockets solock() is the netlock wrapper, so soreceive() could be performed simultaneous with exclusively locked code paths.
Th
Don't take solock() in soreceive() for SOCK_RAW inet sockets.
For inet sockets solock() is the netlock wrapper, so soreceive() could be performed simultaneous with exclusively locked code paths.
These sockets are not connection oriented, they don't call pru_rcvd(), they can't be spliced, they don't set `so_error'. Nothing to protect with solock() in soreceive() path.
`so_rcv' buffer protected by `sb_mtx' mutex(9), but since it released, sblock() required to serialize concurrent soreceive() and sorflush() threads. Current sblock() is some kind of rwlock(9) implementation, so introduce `sb_lock' rwlock(9) and use it directly for that purpose.
The sorflush() and callers were refactored to avoid solock() for raw inet sockets. This was done to avoid packet processing stop.
Tested and ok bluhm.
show more ...
|
#
b11de491 |
| 10-Apr-2024 |
mvs <mvs@openbsd.org> |
Remove `head' socket re-locking in sonewconn().
uipc_attach() releases solock() because it should be taken after `unp_gc_lock' rwlock(9) which protects the `unp_link' list. For this reason, the list
Remove `head' socket re-locking in sonewconn().
uipc_attach() releases solock() because it should be taken after `unp_gc_lock' rwlock(9) which protects the `unp_link' list. For this reason, the listening `head' socket should be unlocked too while sonewconn() calls uipc_attach(). This could be reworked because now `so_rcv' sockbuf relies on `sb_mtx' mutex(9).
The last one `unp_link' foreach loop within unp_gc() discards sockets previously marked as UNP_GCDEAD. These sockets are not accessed from the userland. The only exception is the sosend() threads of connected sending peers, but they only sbappend*() mbuf(9) to `so_rcv'. So it's enough to unlink mbuf(9) chain with `sb_mtx' held and discard lockless.
Please note, the existing SS_NEWCONN_WAIT logic was never used because the listening unix(4) socket protected from concurrent unp_detach() by vnode(9) lock, however `head' re-locked all times.
ok bluhm
show more ...
|
#
47517268 |
| 27-Mar-2024 |
mvs <mvs@openbsd.org> |
Introduce SB_OWNLOCK to mark sockets which `so_rcv' buffer modified outside socket lock.
`sb_mtx' mutex(9) used for this case and it should not be released between `so_rcv' usage check and correspon
Introduce SB_OWNLOCK to mark sockets which `so_rcv' buffer modified outside socket lock.
`sb_mtx' mutex(9) used for this case and it should not be released between `so_rcv' usage check and corresponding sbwait() sleep. Otherwise wakeup() could be lost sometimes.
ok bluhm
show more ...
|
#
02a82712 |
| 26-Mar-2024 |
mvs <mvs@openbsd.org> |
Use `sb_mtx' to protect `so_rcv' receive buffer of unix(4) sockets.
This makes re-locking unnecessary in the uipc_*send() paths, because it's enough to lock one socket to prevent peer from concurren
Use `sb_mtx' to protect `so_rcv' receive buffer of unix(4) sockets.
This makes re-locking unnecessary in the uipc_*send() paths, because it's enough to lock one socket to prevent peer from concurrent disconnection. As the little bonus, one unix(4) socket can perform simultaneous transmission and reception with one exception for uipc_rcvd(), which still requires the re-lock for connection oriented sockets.
The socket lock is not held while filt_soread() and filt_soexcept() called from uipc_*send() through sorwakeup(). However, the unlocked access to the `so_options', `so_state' and `so_error' is fine.
The receiving socket can't be or became listening socket. It also can't be disconnected concurrently. This makes immutable SO_ACCEPTCONN, SS_ISDISCONNECTED and SS_ISCONNECTED bits which are clean and set respectively.
`so_error' is set on the peer sockets only by unp_detach(), which also can't be called concurrently on sending socket.
This is also true for filt_fiforead() and filt_fifoexcept(). For other callers like kevent(2) or doaccept() the socket lock is still held.
ok bluhm
show more ...
|
#
fdada4b1 |
| 22-Mar-2024 |
mvs <mvs@openbsd.org> |
Use sorflush() instead of direct unp_scan(..., unp_discard) to discard dead unix(4) sockets.
The difference in direct unp_scan() and sorflush() is the mbuf(9) chain. For the first case it is still l
Use sorflush() instead of direct unp_scan(..., unp_discard) to discard dead unix(4) sockets.
The difference in direct unp_scan() and sorflush() is the mbuf(9) chain. For the first case it is still linked to the `so_rcv', for the second it is not. This is required to make `sb_mtx' mutex(9) the only `so_rcv' sockbuf protection and remove socket re-locking from the most of uipc_*send() paths. The unlinked mbuf(9) chain doesn't require any protection, so this allows to perform sleeping unp_discard() lockless.
Also, the mbuf(9) chain of the discarded socket still contains addresses of file descriptors and it is much safer to unlink it before FRELE() them. This is the reason to commit this diff standalone.
ok bluhm
show more ...
|
#
94b08229 |
| 12-Feb-2024 |
mvs <mvs@openbsd.org> |
Pass protosw instead of domain structure to soalloc() to get real `pr_type'. The corresponding domain is referenced as `pr_domain'. Otherwise dp->dom_protosw->pr_type of inet sockets always points to
Pass protosw instead of domain structure to soalloc() to get real `pr_type'. The corresponding domain is referenced as `pr_domain'. Otherwise dp->dom_protosw->pr_type of inet sockets always points to inetsw[0].
ok bluhm
show more ...
|
#
c75e434f |
| 11-Feb-2024 |
mvs <mvs@openbsd.org> |
Use `sb_mtx' instead of `inp_mtx' in receive path for inet sockets.
In soreceve(), we only touch `so_rcv' socket buffer, which has it's own `sb_mtx' mutex(9) for protection. So, we can avoid solock(
Use `sb_mtx' instead of `inp_mtx' in receive path for inet sockets.
In soreceve(), we only touch `so_rcv' socket buffer, which has it's own `sb_mtx' mutex(9) for protection. So, we can avoid solock() in this path - it's enough to hold `sb_mtx' in soreceive() and around corresponding sbappend*(). But not right now :)
This time we use shared netlock for some inet sockets in the soreceive() path. To protect `so_rcv' buffer we use `inp_mtx' mutex(9) and the pru_lock() to acquire this mutex(9) in socket layer. But the `inp_mtx' mutex belongs to the PCB. We initialize socket before PCB, tcp(4) sockets could exist without PCB, so use `sb_mtx' mutex(9) to protect sockbuf stuff.
This diff mechanically replaces `inp_mtx' by `sb_mtx' in the receive path. Only for sockets which already use `inp_mtx'. All other sockets left as is. They will be converted later.
Since the `sb_mtx' is optional, the new SB_MTXLOCK flag introduced. If this flag is set on `sb_flags', the `sb_mtx' mutex(9) should be taken. New sb_mtx_lock() and sb_mtx_unlock() was introduced to hide this check. They are temporary and will be replaced by mtx_enter() when all this area will be converted to `sb_mtx' mutex(9).
Also, the new sbmtxassertlocked() function introduced to throw corresponding assertion for SB_MTXLOCK marked buffers. This time only sbappendaddr() calls it. This function is also temporary and will be replaced by MTX_ASSERT_LOCKED() later.
ok bluhm
show more ...
|
#
104c0a83 |
| 03-Feb-2024 |
mvs <mvs@openbsd.org> |
Rework socket buffers locking for shared netlock.
Shared netlock is not sufficient to call so{r,w}wakeup(). The following sowakeup() modifies `sb_flags' and knote(9) stuff. Unfortunately, we can't c
Rework socket buffers locking for shared netlock.
Shared netlock is not sufficient to call so{r,w}wakeup(). The following sowakeup() modifies `sb_flags' and knote(9) stuff. Unfortunately, we can't call so{r,w}wakeup() with `inp_mtx' mutex(9) because sowakeup() also calls pgsigio() which grabs kernel lock.
However, `so*_filtops' callbacks only perform read-only access to the socket stuff, so it is enough to hold shared netlock only, but the klist stuff needs to be protected.
This diff introduces `sb_mtx' mutex(9) to protect sockbuf. This time `sb_mtx' used to protect only `sb_flags' and `sb_klist'.
Now we have soassertlocked_readonly() and soassertlocked(). The first one is happy if only shared netlock is held, meanwhile the second wants `so_lock' or pru_lock() be held together with shared netlock.
To keep soassertlocked*() assertions soft, we need to know mutex(9) state, so new mtx_owned() macro was introduces. Also, the new optional (*pru_locked)() handler brings the state of pru_lock().
Tests and ok from bluhm.
show more ...
|
#
66bd633e |
| 11-Jan-2024 |
bluhm <bluhm@openbsd.org> |
Use domain name for socket lock.
Syzkaller with witness complains about lock ordering of pf lock with socket lock. Socket lock for inet is taken before pf lock. Pf lock is taken before socket lock
Use domain name for socket lock.
Syzkaller with witness complains about lock ordering of pf lock with socket lock. Socket lock for inet is taken before pf lock. Pf lock is taken before socket lock for route. This is a false positive as route and inet socket locks are distinct. Witness does not know this. Name the socket lock like the domain of the socket, then rwlock name is used in witness lo_name subtype. Make domain names more consistent for locking, they were not used anyway. Regardless of witness problem, unique lock name for each socket type make sense.
Reported-by: syzbot+34d22dcbf20d76629c5a@syzkaller.appspotmail.com Reported-by: syzbot+fde8d07ba74b69d0adfe@syzkaller.appspotmail.com OK mvs@
show more ...
|
#
9f40d4ac |
| 04-Jul-2023 |
mvs <mvs@openbsd.org> |
Introduce SBL_WAIT and SBL_NOINTR sbwait() flags.
This refactoring is another step to make standalone socket buffers locking. sblock() uses M_WAITOK and M_NOWAIT flags passed as the third argument t
Introduce SBL_WAIT and SBL_NOINTR sbwait() flags.
This refactoring is another step to make standalone socket buffers locking. sblock() uses M_WAITOK and M_NOWAIT flags passed as the third argument together with the SB_NOINTR flag on the `sb_flags' to control sleep behaviour. To perform uninterruptible acquisition, SB_NOINTR flag should be set before sblock() call. `sb_flags' modification requires to hold solock() around sblock()/sbunlock() that makes standalone call impossible.
Also `sb_flags' modifications outside sblock()/sbunlock() makes uninterruptible acquisition code huge enough. This time only sorflush() does this (and forgets to restore SB_NOINTR flag, so shutdown(SHUT_RDWR) call permanently modifies socket locking behaviour) and this looks not the big problem. But with the standalone socket buffer locking it will be many such places, so this huge construction is unwanted.
Introduce new SBL_NOINTR flag passed as third sblock() argument. The sblock() acquisition will be uninterruptible when existing SB_NOINTR flag is set on `sb_flags' or SBL_NOINTR was passed.
The M_WAITOK and M_NOWAIT flags belongs to malloc(9). It has no M_NOINTR flag and there is no reason to introduce it. So for consistency reasons introduce new SBL_WAIT and use it together with SBL_NOINTR instead of M_WAITOK and M_NOINTR respectively.
ok bluhm
show more ...
|
#
80b1b8fb |
| 27-Jan-2023 |
mvs <mvs@openbsd.org> |
Replace selinfo structure by klist in sockbuf. No reason to keep it, selinfo is just wrapper to klist. netstat(1) and libkvm use socket structure, but don't touch so_{snd,rcv}.sb_sel.
ok visa@
|
#
734e0323 |
| 23-Jan-2023 |
mvs <mvs@openbsd.org> |
Remove "sb == &so->so_rcv || sb == &so->so_snd" assertion from sb_notify() and sbspace(). Now it's overkilling.
ok bluhm@
|
#
a651bdb4 |
| 23-Jan-2023 |
mvs <mvs@openbsd.org> |
Move SS_ISSENDING flag to `sb_state'. It should belong to the send buffer as the SS_CANTSENDMORE flag.
ok bluhm@
|
#
4b9bfff3 |
| 22-Jan-2023 |
mvs <mvs@openbsd.org> |
Move SS_CANTRCVMORE and SS_RCVATMARK bits from `so_state' to `sb_state' of receive buffer. As it was done for SS_CANTSENDMORE bit, the definition kept as is, but now these bits belongs to the `sb_sta
Move SS_CANTRCVMORE and SS_RCVATMARK bits from `so_state' to `sb_state' of receive buffer. As it was done for SS_CANTSENDMORE bit, the definition kept as is, but now these bits belongs to the `sb_state' of receive buffer. `sb_state' ored with `so_state' when socket data exporting to the userland.
ok bluhm@
show more ...
|
#
9e437519 |
| 21-Jan-2023 |
mvs <mvs@openbsd.org> |
Introduce per-sockbuf `sb_state' to use it with SS_CANTSENDMORE.
This time, socket's buffer lock requires solock() to be held. As a part of socket buffers standalone locking work, move socket state
Introduce per-sockbuf `sb_state' to use it with SS_CANTSENDMORE.
This time, socket's buffer lock requires solock() to be held. As a part of socket buffers standalone locking work, move socket state bits which represent its buffers state to per buffer state.
Opposing the previous reverted diff, the SS_CANTSENDMORE definition left as is, but it used only with `sb_state'. `sb_state' ored with original `so_state' when socket's data exported to the userland, so the ABI kept as it was.
Inputs from deraadt@.
ok bluhm@
show more ...
|
#
33da1efb |
| 12-Dec-2022 |
tb <tb@openbsd.org> |
Revert sb_state changes to unbreak tree.
|
#
cba97bf9 |
| 11-Dec-2022 |
mvs <mvs@openbsd.org> |
This time, socket's buffer lock requires solock() to be held. As a part of socket buffers standalone locking work, move socket state bits which represent its buffers state to per buffer state. Introd
This time, socket's buffer lock requires solock() to be held. As a part of socket buffers standalone locking work, move socket state bits which represent its buffers state to per buffer state. Introduce `sb_state' and turn SS_CANTSENDMORE to SBS_CANTSENDMORE. This bit will be processed on `so_snd' buffer only.
Move SS_CANTRCVMORE and SS_RCVATMARK bits with separate diff to make review easier and exclude possible so_rcv/so_snd mistypes.
Also, don't adjust the remaining SS_* bits right now.
ok millert@
show more ...
|
#
d452b926 |
| 26-Nov-2022 |
mvs <mvs@openbsd.org> |
Turn sowriteable(), sballoc() and sbfree() macro to inline functions.
soreadable() is already presented as inline function, but corresponding sowriteable() is still macro. Also it's no reason to kee
Turn sowriteable(), sballoc() and sbfree() macro to inline functions.
soreadable() is already presented as inline function, but corresponding sowriteable() is still macro. Also it's no reason to keep sballoc() and sbfree() as macro.
The first argument of sballoc() and sbfree() is not used, but keep it for a while.
ok kn@ bluhm@
show more ...
|
#
62440853 |
| 03-Oct-2022 |
bluhm <bluhm@openbsd.org> |
System calls should not fail due to temporary memory shortage in malloc(9) or pool_get(9). Pass down a wait flag to pru_attach(). During syscall socket(2) it is ok to wait, this logic was missing fo
System calls should not fail due to temporary memory shortage in malloc(9) or pool_get(9). Pass down a wait flag to pru_attach(). During syscall socket(2) it is ok to wait, this logic was missing for internet pcb. Pfkey and route sockets were already waiting. sonewconn() must not wait when called during TCP 3-way handshake. This logic has been preserved. Unix domain stream socket connect(2) can wait until the other side has created the socket to accept. OK mvs@
show more ...
|
#
9bbbb445 |
| 05-Sep-2022 |
bluhm <bluhm@openbsd.org> |
Use shared netlock in soreceive(). The UDP and IP divert layer provide locking of the PCB. If that is possible, use shared instead of exclusive netlock in soreceive(). The PCB mutex provides a per
Use shared netlock in soreceive(). The UDP and IP divert layer provide locking of the PCB. If that is possible, use shared instead of exclusive netlock in soreceive(). The PCB mutex provides a per socket lock against multiple soreceive() running in parallel. Release and regrab both locks in sosleep_nsec(). OK mvs@
show more ...
|
#
70f8daa1 |
| 01-Sep-2022 |
jsg <jsg@openbsd.org> |
remove sb_lock() prototype; removed in uipc_socket2.c 1.64
|
#
621c49c5 |
| 21-Aug-2022 |
mvs <mvs@openbsd.org> |
Change soabort() return value to void. We never interesting on it.
ok bluhm@
|
#
69690c9b |
| 13-Aug-2022 |
mvs <mvs@openbsd.org> |
Introduce the pru_*() wrappers for corresponding (*pr_usrreq)() calls.
This is helpful for the following (*pr_usrreq)() split to multiple handlers. But right now this makes code more readable.
Also
Introduce the pru_*() wrappers for corresponding (*pr_usrreq)() calls.
This is helpful for the following (*pr_usrreq)() split to multiple handlers. But right now this makes code more readable.
Also add '#ifndef _SYS_SOCKETVAR_H_' to sys/socketvar.h. This prevents the collisions when both sys/protosw.h and sys/socketvar.h are included together. Both 'socket' and 'protosw' structures are required to be defined before pru_*() wrappers, so we need to include sys/socketvar.h to sys/protosw.h.
ok bluhm@
show more ...
|
#
88d6f328 |
| 15-Jul-2022 |
deraadt <deraadt@openbsd.org> |
pledge "getpw" would notice access to /var/run/ypbind.lock, and grant "inet" rights, so that libc/yp could access YP services via a fairly complex 'protocol' including file access, sockets, etc. Thi
pledge "getpw" would notice access to /var/run/ypbind.lock, and grant "inet" rights, so that libc/yp could access YP services via a fairly complex 'protocol' including file access, sockets, etc. This YP protocol is also used by ypldap -- this is our way of bringing 'NIS' services into libc without monster sub-libraries. I have managed to remove this "inet" right by creating a new ypconnect() system call, which performs parts of the yp_bind.c dance inside the kernel.. It checks if domainname is set, looks for a binding file with advisory lock, reads it to get the IP and udp/tcp port numbers, and then establishes a connnected socket direct to that ypserv. This socket has a SS_YP flag set, and non-required system calls are prohibited. libc maintains lifetime on this socket so a process should never see it, but it seems safer to block udp re-connect and other calls even in non-pledge mode. Userland changes to use this will follow in a few days. Lots of help from claudio and jmatthew, also ok miod
show more ...
|