#
f11c1ce4 |
| 05-Nov-2024 |
bluhm <bluhm@openbsd.org> |
Replace rwlock with iterator in UDP input multicast loop.
The broadcast and multicast loop in udp_input() is protected by the table mutex. The relevant PCBs were collected in a separate list, which
Replace rwlock with iterator in UDP input multicast loop.
The broadcast and multicast loop in udp_input() is protected by the table mutex. The relevant PCBs were collected in a separate list, which was processed while the table notify rwlock was held. When sending UDP multicast packets over vxlan(4) configured over UDP with multicast groups, this lock was taken recursively causing a kernel crash. By using an iterator, traversing the PCB list of the table does not require to hold the mutex all the time. Only while accessing the next element after the iterator, the mutex is taken for a short time. udp_sbappend() and the upcall to vxlan_input() is done with neither mutex nor rwlock. The PCB is reference counted while traversing the list.
crash reported by Holger Glaess; iterator implemented by mvs@; tested and fixed by bluhm@; OK mvs@
show more ...
|
#
93536db2 |
| 12-Apr-2024 |
bluhm <bluhm@openbsd.org> |
Split single TCP inpcb table into IPv4 and IPv6 parts.
With two separate TCP hash tables, each one becomes smaller. When we remove the exclusive net lock from TCP, contention on internet PCB table
Split single TCP inpcb table into IPv4 and IPv6 parts.
With two separate TCP hash tables, each one becomes smaller. When we remove the exclusive net lock from TCP, contention on internet PCB table mutex will be reduced. UDP has been split earlier into IPv4 and IPv6. Replace branch conditions based on INP_IPV6 with assertions.
OK mvs@
show more ...
|
#
eb7afff9 |
| 31-Mar-2024 |
bluhm <bluhm@openbsd.org> |
Combine route_cache() and rtalloc_mpath() in new route_mpath().
Fill and check the cache and call rtalloc_mpath() together. Then the caller of route_mpath() does not have to care about the uint32_t
Combine route_cache() and rtalloc_mpath() in new route_mpath().
Fill and check the cache and call rtalloc_mpath() together. Then the caller of route_mpath() does not have to care about the uint32_t *src pointer and just pass struct in_addr. All the conversions are done inside the functions.
A previous version of this diff was backed out. There was an additional rtisvalid() in rtalloc_mpath() that prevented packet output via interfaces that were not up. Now the route in the cache has to be valid, but after new lookup, rtalloc_mpath() may return invalid routes. This generates less errors in userland an preserves existing behavior.
OK sashan@
show more ...
|
#
97ca6483 |
| 22-Mar-2024 |
bluhm <bluhm@openbsd.org> |
Make local port which is bound during connect(2) unique per laddr.
in_pcbconnect() did not pass down the address it got from in_pcbselsrc() to in_pcbpickport(). As a consequence local port numbers
Make local port which is bound during connect(2) unique per laddr.
in_pcbconnect() did not pass down the address it got from in_pcbselsrc() to in_pcbpickport(). As a consequence local port numbers selected during connect(2) were globally unique although they belong to different addresses. This strict uniqueness is not necessary and wastes usable ports for outgoing connections.
To solve this, pass ina from in_pcbconnect() to in_pcbbind_locked(). This does not interfere how wildcard sockets are matched with specific sockets during bind(2). It only allows non-wildcard sockets to share a local port during connect(2).
OK mvs@ deraadt@
show more ...
|
#
231fd943 |
| 29-Feb-2024 |
naddy <naddy@openbsd.org> |
revert "Combine route_cache() and rtalloc_mpath() in new route_mpath()"
It breaks NFS.
ok claudio@
|
#
a3112ee6 |
| 27-Feb-2024 |
bluhm <bluhm@openbsd.org> |
Combine route_cache() and rtalloc_mpath() in new route_mpath().
Fill and check the cache and call rtalloc_mpath() together. Then the caller of route_mpath() does not have to care about the uint32_t
Combine route_cache() and rtalloc_mpath() in new route_mpath().
Fill and check the cache and call rtalloc_mpath() together. Then the caller of route_mpath() does not have to care about the uint32_t *src pointer and just pass struct in_addr. All the conversions are done inside the functions. ro->ro_rt is either valid or NULL. Note that some places have a stricter rtisvalid() now compared to the previous NULL check.
OK claudio@
show more ...
|
#
caa7f414 |
| 22-Feb-2024 |
bluhm <bluhm@openbsd.org> |
Make the route cache aware of multipath routing.
Pass source address to route_cache() and store it in struct route. Cached multipath routes are only valid if source address matches. If sysctl multip
Make the route cache aware of multipath routing.
Pass source address to route_cache() and store it in struct route. Cached multipath routes are only valid if source address matches. If sysctl multipath changes, increase route generation number.
OK claudio@
show more ...
|
#
94c0e2bd |
| 13-Feb-2024 |
bluhm <bluhm@openbsd.org> |
Merge struct route and struct route_in6.
Use a common struct route for both inet and inet6. Unfortunately struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has to be exposed from net/ro
Merge struct route and struct route_in6.
Use a common struct route for both inet and inet6. Unfortunately struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has to be exposed from net/route.h. Struct route has to be bsd visible for userland as netstat kvm code inspects inp_route. Internet PCB and TCP SYN cache can use a plain struct route now. All specific sockaddr types for inet and inet6 are embeded there.
OK claudio@
show more ...
|
#
940d25ac |
| 11-Feb-2024 |
bluhm <bluhm@openbsd.org> |
Remove include netinet6/ip6_var.h from netinet/in_pcb.h.
OK mvs@
|
#
18747b91 |
| 09-Feb-2024 |
bluhm <bluhm@openbsd.org> |
Route cache function returns hit or miss.
The route_cache() function can easily return whether it was a cache hit or miss. Then the logic to perform a route lookup gets a bit simpler. Some more co
Route cache function returns hit or miss.
The route_cache() function can easily return whether it was a cache hit or miss. Then the logic to perform a route lookup gets a bit simpler. Some more complicated if (ro->ro_rt == NULL) checks still exist elsewhere. Also use route cache in in_pcbselsrc() instead of filling struct route manually.
OK claudio@
show more ...
|
#
22723314 |
| 07-Feb-2024 |
bluhm <bluhm@openbsd.org> |
Use the route generation number also for IPv6.
Implement route6_cache() to check whether the cached route is still valid and otherwise fill caching parameter of struct route_in6. Also count cache hi
Use the route generation number also for IPv6.
Implement route6_cache() to check whether the cached route is still valid and otherwise fill caching parameter of struct route_in6. Also count cache hits and misses in netstat. in_pcbrtentry() uses route cache now.
OK claudio@
show more ...
|
#
3dc61bc4 |
| 31-Jan-2024 |
bluhm <bluhm@openbsd.org> |
Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.
Splitting the IPv6 code into a separate function results in less #ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of the
Split in_pcbrtentry() and in6_pcbrtentry() based on INP_IPV6.
Splitting the IPv6 code into a separate function results in less #ifdef INET6. Also struct route_in6 *ro in in6_pcbrtentry() is of the correct type and in_pcbrtentry() does not rely on the fact that inp_route and inp_route6 are pointers to the same union.
OK kn@ claudio@
show more ...
|
#
7b1356d5 |
| 28-Jan-2024 |
bluhm <bluhm@openbsd.org> |
Use more specific sockaddr type for inpcb notify.
in_pcbnotifyall() is an IPv4 only function. All callers check that sockaddr dst is in fact a sockaddr_in. Pass the more spcific type and remove th
Use more specific sockaddr type for inpcb notify.
in_pcbnotifyall() is an IPv4 only function. All callers check that sockaddr dst is in fact a sockaddr_in. Pass the more spcific type and remove the runtime check at beginning of in_pcbnotifyall(). Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6 in6_pcbnotify() as dst parameter.
OK millert@
show more ...
|
#
ad6c4bdc |
| 09-Jan-2024 |
bluhm <bluhm@openbsd.org> |
Convert some struct inpcb parameter to const pointer.
OK millert@
|
#
6426d56d |
| 07-Dec-2023 |
bluhm <bluhm@openbsd.org> |
Inpcb table mutex protects addr and port during bind(2) and connect(2).
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set addresses and ports within the same critical section as the in
Inpcb table mutex protects addr and port during bind(2) and connect(2).
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() have to set addresses and ports within the same critical section as the inpcb hash table calculation. Also lookup and address selection have to be protected to avoid bindings and connections that are not unique.
For that in_pcbpickport() and in_pcbbind_locked() expect that the table mutex is already taken. The functions in_pcblookup_lock(), in_pcblookup_local_lock(), and in_pcbaddrisavail_lock() grab the mutex iff the lock parameter is IN_PCBLOCK_GRAB. Otherwise the parameter is IN_PCBLOCK_HOLD has the lock has to be taken already. Note that in_pcblookup_lock() and in_pcblookup_local() return an inp with increased reference iff they take and release the lock. Otherwise the caller protects the life time of the inp.
This gives enough flexibility that in_pcbbind() and in_pcbconnect() can hold the table mutex when they need it. The public inpcb API does not change.
OK sashan@ mvs@
show more ...
|
#
921ffa12 |
| 03-Dec-2023 |
bluhm <bluhm@openbsd.org> |
Rename all in6p local variables to inp.
There exists no struct in6pcb in OpenBSD, this was an old kame idea. Calling the local variable in6p does not make sense, it is actually a struct inpcb. Also
Rename all in6p local variables to inp.
There exists no struct in6pcb in OpenBSD, this was an old kame idea. Calling the local variable in6p does not make sense, it is actually a struct inpcb. Also in6p is not used consistently in inet6 code. Having the same convention for IPv4 and IPv6 is less confusing.
OK sashan@ mvs@
show more ...
|
#
ab485656 |
| 03-Dec-2023 |
bluhm <bluhm@openbsd.org> |
Use INP_IPV6 flag instead of sotopf().
During initialization in_pcballoc() sets INP_IPV6 once to avoid reaching through inp_socket->so_proto->pr_domain->dom_family. Use this flag consistently.
OK
Use INP_IPV6 flag instead of sotopf().
During initialization in_pcballoc() sets INP_IPV6 once to avoid reaching through inp_socket->so_proto->pr_domain->dom_family. Use this flag consistently.
OK sashan@ mvs@
show more ...
|
#
cd28665a |
| 01-Dec-2023 |
bluhm <bluhm@openbsd.org> |
Set inp address, port and rtable together with inpcb hash.
The inpcb hash table is protected by table->inpt_mtx. The hash is based on addresses, ports, and routing table. These fields were not syc
Set inp address, port and rtable together with inpcb hash.
The inpcb hash table is protected by table->inpt_mtx. The hash is based on addresses, ports, and routing table. These fields were not sychronized with the hash. Put writes and hash update into the same critical section. Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(), tcp_connect(), udp_disconnect() to dedicated inpcb set functions. There they use the same table mutex as in_pcbrehash(). in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work and are not included yet.
OK sashan@ mvs@
show more ...
|
#
cff23a6b |
| 01-Dec-2023 |
bluhm <bluhm@openbsd.org> |
Make internet PCB connect more consistent.
The public interface is in_pcbconnect(). It dispatches to in6_pcbconnect() if necessary. Call the former from tcp_connect() and udp_connect(). In in6_pcb
Make internet PCB connect more consistent.
The public interface is in_pcbconnect(). It dispatches to in6_pcbconnect() if necessary. Call the former from tcp_connect() and udp_connect(). In in6_pcbconnect() initialization in6a = NULL is not necessary. in6_pcbselsrc() sets the pointer, but does not read the value. Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc(). It returns a reference to the address of some internal data structure. We want to be sure that in6_addr is not modified this way. IPv4 in_pcbselsrc() solves this by passing a copy of the address.
OK kn@ sashan@ mvs@
show more ...
|
#
c7641205 |
| 29-Nov-2023 |
bluhm <bluhm@openbsd.org> |
Document inp_socket as immutable and remove NULL checks.
Struct inpcb field inp_socket is initialized in in_pcballoc(). It is not NULL and never changed.
OK mvs@
|
#
952c6363 |
| 28-Nov-2023 |
bluhm <bluhm@openbsd.org> |
Remove struct inpcb from in6_embedscope() parameters.
rip6_output() did modify inp_outputopts6 temporarily to provide different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6 and inp_
Remove struct inpcb from in6_embedscope() parameters.
rip6_output() did modify inp_outputopts6 temporarily to provide different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6 and inp_moptions6 as separate arguments to in6_embedscope(). Simplify the code that deals with these options in in6_embedscope(). Doucument inp_moptions and inp_moptions6 as protected by net lock.
OK kn@
show more ...
|
#
8bde4b77 |
| 24-Jun-2023 |
bluhm <bluhm@openbsd.org> |
Calculate inet PCB SIP hash without table mutex.
Goal is to run UDP input in parallel. Btrace kstack analysis shows that SIP hash for PCB lookup is quite expensive. When running in parallel, there
Calculate inet PCB SIP hash without table mutex.
Goal is to run UDP input in parallel. Btrace kstack analysis shows that SIP hash for PCB lookup is quite expensive. When running in parallel, there is also lock contention on the PCB table mutex.
It results in better performance to calculate the hash value before taking the mutex. The hash secret has to be constant as hash calculation must not depend on values protected by the table mutex. Do not reseed anymore when hash table gets resized.
Analysis also shows that asserting a rw_lock while holding a mutex is a bit expensive. Just remove the netlock assert.
OK dlg@ mvs@
show more ...
|
#
c3a3d609 |
| 03-Sep-2022 |
mvs <mvs@openbsd.org> |
Move PRU_PEERADDR request to (*pru_peeraddr)().
Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets, except tcp(4) case.
Also remove *_usrreq() handlers.
ok bluhm@
|
#
0dc53d81 |
| 03-Sep-2022 |
mvs <mvs@openbsd.org> |
Move PRU_SOCKADDR request to (*pru_sockaddr)()
Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4) inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.
T
Move PRU_SOCKADDR request to (*pru_sockaddr)()
Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4) inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.
The key management and route domain sockets returns EINVAL error for PRU_SOCKADDR request, so keep this behaviour for a while instead of make pru_sockaddr handler optional and return EOPNOTSUPP.
ok bluhm@
show more ...
|
#
a6b8fd29 |
| 30-Aug-2022 |
bluhm <bluhm@openbsd.org> |
Refactor internet PCB lookup function. Rename in_pcbhashlookup() so the public API is in_pcblookup() and in_pcblookup_listen(). For internal use introduce in_pcbhash_insert() and in_pcbhash_lookup(
Refactor internet PCB lookup function. Rename in_pcbhashlookup() so the public API is in_pcblookup() and in_pcblookup_listen(). For internal use introduce in_pcbhash_insert() and in_pcbhash_lookup() to avoid code duplication. Routing domain is unsigned, change the type to u_int. OK mvs@
show more ...
|