#
3ceb73ad |
| 14-May-2024 |
bluhm <bluhm@openbsd.org> |
Sanity check for TSO payload length in TCP chopper.
Although it should not happen, check that ph_mss is not 0 in tcp_chopper(). This could catch errors in the LRO path of network drivers. Better c
Sanity check for TSO payload length in TCP chopper.
Although it should not happen, check that ph_mss is not 0 in tcp_chopper(). This could catch errors in the LRO path of network drivers. Better count bad packet and drop it rather than ending in an endless loop. The new logic is analog to a recent change in the hardware TSO path in the drivers.
OK jan@
show more ...
|
#
ace0f189 |
| 17-Apr-2024 |
bluhm <bluhm@openbsd.org> |
Use struct ipsec_level within inpcb.
Instead of passing around u_char[4], introduce struct ipsec_level that contains 4 ipsec levels. This provides better type safety. The embedding struct inpcb is
Use struct ipsec_level within inpcb.
Instead of passing around u_char[4], introduce struct ipsec_level that contains 4 ipsec levels. This provides better type safety. The embedding struct inpcb is globally visible for netstat(1), so put struct ipsec_level outside of #ifdef _KERNEL.
OK deraadt@ mvs@
show more ...
|
#
94c0e2bd |
| 13-Feb-2024 |
bluhm <bluhm@openbsd.org> |
Merge struct route and struct route_in6.
Use a common struct route for both inet and inet6. Unfortunately struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has to be exposed from net/ro
Merge struct route and struct route_in6.
Use a common struct route for both inet and inet6. Unfortunately struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has to be exposed from net/route.h. Struct route has to be bsd visible for userland as netstat kvm code inspects inp_route. Internet PCB and TCP SYN cache can use a plain struct route now. All specific sockaddr types for inet and inet6 are embeded there.
OK claudio@
show more ...
|
#
940d25ac |
| 11-Feb-2024 |
bluhm <bluhm@openbsd.org> |
Remove include netinet6/ip6_var.h from netinet/in_pcb.h.
OK mvs@
|
#
2551e577 |
| 26-Nov-2023 |
bluhm <bluhm@openbsd.org> |
Remove inp parameter from ip_output().
ip_output() received inp as parameter. This is only used to lookup the IPsec level of the socket. Reasoning about MP locking is much easier if only relevant
Remove inp parameter from ip_output().
ip_output() received inp as parameter. This is only used to lookup the IPsec level of the socket. Reasoning about MP locking is much easier if only relevant data is passed around. Convert ip_output() to receive constant inp_seclevel as argument and mark it as protected by net lock.
OK mvs@
show more ...
|
#
9e96aff0 |
| 06-Jul-2023 |
bluhm <bluhm@openbsd.org> |
Convert tcp_now() time counter to 64 bit.
After changing tcp now tick to milliseconds, 32 bits will wrap around after 49 days of uptime. That may be a problem in some places of our stack. Better u
Convert tcp_now() time counter to 64 bit.
After changing tcp now tick to milliseconds, 32 bits will wrap around after 49 days of uptime. That may be a problem in some places of our stack. Better use a 64 bit counter.
As timestamp option is 32 bit in TCP protocol, use the lower 32 bit there. There are casts to 32 bits that should behave correctly.
Start with random 63 bit offset to avoid uptime leakage. 2^63 milliseconds result in 2.9*10^8 years of possible uptime.
OK yasuoka@
show more ...
|
#
e790ea0e |
| 04-Jul-2023 |
bluhm <bluhm@openbsd.org> |
Remove redundant code when calculating checksum. OK jmatthew@
|
#
510f4386 |
| 15-May-2023 |
bluhm <bluhm@openbsd.org> |
Implement the TCP/IP layer for hardware TCP segmentation offload. If the driver of a network interface claims to support TSO, do not chop the packet in software, but pass it down to the interface lay
Implement the TCP/IP layer for hardware TCP segmentation offload. If the driver of a network interface claims to support TSO, do not chop the packet in software, but pass it down to the interface layer. Precalculate parts of the pseudo header checksum, but without the packet length. The length of all generated smaller packets is not known yet. Driver and hardware will use the mbuf packet header field ph_mss to calculate it and update checksum. Introduce separate flags IFCAP_TSOv4 and IFCAP_TSOv6 as hardware might support ony one protocol family. The old flag IFXF_TSO is only relevant for large receive offload. It is missnamed, but keep that for now. Note that drivers do not set TSO capabilites yet. Also the ifconfig flags and pseudo interfaces capabilities will be done separately. So this commit should not change behavior. heavily based on the work from jan@; OK sashan@
show more ...
|
#
55055d61 |
| 13-May-2023 |
bluhm <bluhm@openbsd.org> |
Instead of implementing IPv4 header checksum creation everywhere, introduce in_hdr_cksum_out(). It is used like in_proto_cksum_out(). OK claudio@
|
#
c06845b1 |
| 10-May-2023 |
bluhm <bluhm@openbsd.org> |
Implement TCP send offloading, for now in software only. This is meant as a fallback if network hardware does not support TSO. Driver support is still work in progress. TCP output generates large
Implement TCP send offloading, for now in software only. This is meant as a fallback if network hardware does not support TSO. Driver support is still work in progress. TCP output generates large packets. In IP output the packet is chopped to TCP maximum segment size. This reduces the CPU cycles used by pf. The regular output could be assisted by hardware later, but pf route-to and IPsec needs the software fallback in general. For performance comparison or to workaround possible bugs, sysctl net.inet.tcp.tso=0 disables the feature. netstat -s -p tcp shows TSO counter with chopped and generated packets. based on work from jan@ tested by jmc@ jan@ Hrvoje Popovski OK jan@ claudio@
show more ...
|
#
7e69e494 |
| 25-Apr-2023 |
bluhm <bluhm@openbsd.org> |
Fix white space.
|
#
00007ca3 |
| 07-Nov-2022 |
yasuoka <yasuoka@openbsd.org> |
Modify TCP receive buffer size auto scaling to use the smoothed RTT (SRTT) instead of the timestamp option. Since the timestamp option is disabled on some OSs (eg. Windows) or dropped by some firewa
Modify TCP receive buffer size auto scaling to use the smoothed RTT (SRTT) instead of the timestamp option. Since the timestamp option is disabled on some OSs (eg. Windows) or dropped by some firewalls/routers, in such a case the window size had been fixed at 16KB, this limits throughput at very low on high latency networks. Also replace "tcp_now" from 2HZ tick counter to binuptime in milliseconds to calculate the SRTT better.
tested by krw matthieu jmatthew dlg djm stu stsp ok claudio
show more ...
|
#
8c664ca5 |
| 03-Sep-2022 |
bluhm <bluhm@openbsd.org> |
Use a mutex to update tcp_maxidle, tcp_iss, and tcp_now. This removes pressure from the exclusive netlock in tcp_slowtimo(). Reading is done atomically. Ensure that the tcp_now value is read only o
Use a mutex to update tcp_maxidle, tcp_iss, and tcp_now. This removes pressure from the exclusive netlock in tcp_slowtimo(). Reading is done atomically. Ensure that the tcp_now value is read only once per function to provide consistent time. OK yasuoka@
show more ...
|
#
ced6d44d |
| 11-Aug-2022 |
claudio <claudio@openbsd.org> |
Add TCP_INFO support to getsockopt for tcp sessions.
TCP_INFO provides a lot of information about the TCP session of this socket. Many processes like to peek at the rtt of a connection but this also
Add TCP_INFO support to getsockopt for tcp sessions.
TCP_INFO provides a lot of information about the TCP session of this socket. Many processes like to peek at the rtt of a connection but this also provides a lot of more special info for use by e.g. tcpbench(1). While the basic minimal info is available all the time the more specific data is only populated for privileged processes. This is done to not share data back to userland that may allow to attack a session. TCP_INFO is available to pledge "inet" since pledged processes like chrome tend to use TCP_INFO when available. OK bluhm@
show more ...
|
#
4b0e5db3 |
| 25-Nov-2021 |
bluhm <bluhm@openbsd.org> |
Implement reference counting for IPsec tdbs. Not all cases are covered yet, more ref counts to come. The timeouts are protected, so the racy tdb_reaper() gets retired. The tdb_policy_head, onext a
Implement reference counting for IPsec tdbs. Not all cases are covered yet, more ref counts to come. The timeouts are protected, so the racy tdb_reaper() gets retired. The tdb_policy_head, onext and inext lists are protected. All gettdb...() functions return a tdb that is ref counted and has to be unrefed later. A flag ensures that tdb_delete() is called only once. Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@
show more ...
|
#
d86f40e2 |
| 08-Feb-2021 |
jan <jan@openbsd.org> |
Remove maxburst feature from tcp_output
OK bluhm@, claudio@, deraadt@
|
#
aa794c2b |
| 25-Jan-2021 |
dlg <dlg@openbsd.org> |
if stoeplitz is enabled, use it to provide a flowid for tcp packets.
drivers that implement rss and multiple rings depend on the symmetric toeplitz code, and use it to generate a key that decides wi
if stoeplitz is enabled, use it to provide a flowid for tcp packets.
drivers that implement rss and multiple rings depend on the symmetric toeplitz code, and use it to generate a key that decides with rx ring a packet lands on. if the toeplitz code is enabled, this diff has the pcb and tcp layer use the toeplitz code to generate a flowid for packets they send, which in turn is used to pick a tx ring. because the nic and the stack use the same key, the tx and rx sides end up with the same hash/flowid. at the very least this means that the same rx and tx queue pair on a particular nic are used for both sides of the connection. as the stack becomes more parallel, it will also help keep both sides of the tcp connection processing in the one place.
show more ...
|
#
097c9a81 |
| 10-Nov-2018 |
bluhm <bluhm@openbsd.org> |
Do not translate the EACCES error from pf(4) to EHOSTUNREACH anymore. It also translated a documented send(2) EACCES case erroneously. This was too much magic and always prone to errors. from Jan Kle
Do not translate the EACCES error from pf(4) to EHOSTUNREACH anymore. It also translated a documented send(2) EACCES case erroneously. This was too much magic and always prone to errors. from Jan Klemkow; man page jmc@; OK claudio@
show more ...
|
#
b5b7f62e |
| 09-Nov-2018 |
claudio <claudio@openbsd.org> |
M_LEADINGSPACE() and M_TRAILINGSPACE() are just wrappers for m_leadingspace() and m_trailingspace(). Convert all callers to call directly the functions and remove the defines. OK krw@, mpi@
|
#
0e560947 |
| 13-Sep-2018 |
bluhm <bluhm@openbsd.org> |
Add reference counting for inet pcb, this will be needed when we start locking the socket. An inp can be referenced by the PCB queue and hashes, by a pf mbuf header, or by a pf state key. OK visa@
|
#
4e64d49b |
| 11-Jun-2018 |
bluhm <bluhm@openbsd.org> |
The output from tcp debug sockets was incomplete. After detach tp was NULL and nothing was traced. So save the old tcpcb and use that to retrieve some information. Note that otb may be freed and m
The output from tcp debug sockets was incomplete. After detach tp was NULL and nothing was traced. So save the old tcpcb and use that to retrieve some information. Note that otb may be freed and must not be dereferenced. Use a heuristic for cases where the address family is in the IP header but not provided in the PCB. OK visa@
show more ...
|
#
5bcca80f |
| 08-May-2018 |
bluhm <bluhm@openbsd.org> |
Historically there were slow and fast tcp timeouts. That is why the delack timer had a different implementation. Use the same mechanism for all TCP timer. OK mpi@ visa@
|
#
e441a72a |
| 25-Oct-2017 |
job <job@openbsd.org> |
Remove the TCP_FACK option and associated #if{,n}def code.
TCP_FACK was disabled by provos@ in June 1999. TCP_FACK is an algorithm that decides that when something is lost, all not SACKed packets un
Remove the TCP_FACK option and associated #if{,n}def code.
TCP_FACK was disabled by provos@ in June 1999. TCP_FACK is an algorithm that decides that when something is lost, all not SACKed packets until the most forward SACK are lost. It may be a correct estimate, if network does not reorder packets.
OK visa@ mpi@ mikeb@
show more ...
|
#
86385160 |
| 22-Oct-2017 |
mikeb <mikeb@openbsd.org> |
Unconditionally enable TCP selective acknowledgements (SACK)
OK deraadt, mpi, visa, job
|
#
06235387 |
| 26-Jun-2017 |
mpi <mpi@openbsd.org> |
Assert that the corresponding socket is locked when manipulating socket buffers.
This is one step towards unlocking TCP input path. Note that all the functions asserting for the socket lock are not
Assert that the corresponding socket is locked when manipulating socket buffers.
This is one step towards unlocking TCP input path. Note that all the functions asserting for the socket lock are not necessarilly MP-safe. All the fields of 'struct socket' aren't protected.
Introduce a new kernel-only kqueue hint, NOTE_SUBMIT, to be able to tell when a filter needs to lock the underlying data structures. Logic and name taken from NetBSD.
Tested by Hrvoje Popovski.
ok claudio@, bluhm@, mikeb@
show more ...
|