#
0e25137a |
| 02-Jul-2024 |
bluhm <bluhm@openbsd.org> |
Read IPsec forwarding information once.
Fix MP race between reading ip_forwarding in ip_input() and checking ip_forwarding == 2 in ip_output(). In theory ip_forwarding could be 2 during ip_input()
Read IPsec forwarding information once.
Fix MP race between reading ip_forwarding in ip_input() and checking ip_forwarding == 2 in ip_output(). In theory ip_forwarding could be 2 during ip_input() and later 0 in ip_output(). Then a packet would be forwarded that was never allowed. Currently exclusive netlock in sysctl(2) prevents all races.
Introduce IP_FORWARDING_IPSEC and pass it with the flags parameter that was introduced for IP_FORWARDING.
Instead of calling m_tag_find(), traversing the list, and comparing with NULL, just check the PACKET_TAG_IPSEC_IN_DONE bit. Reading ipsec_in_use in ip_output() is a performance hack that is not necessary. New code only checks tree bits.
OK mvs@
show more ...
|
#
84b2c343 |
| 07-Jun-2024 |
bluhm <bluhm@openbsd.org> |
Read IP forwarding variables only once.
Do not assume that ip_forwarding and ip_directedbcast cannot change while processing one packet. Read it once and pass down its value with a flag. This is n
Read IP forwarding variables only once.
Do not assume that ip_forwarding and ip_directedbcast cannot change while processing one packet. Read it once and pass down its value with a flag. This is necessary for unlocking the sysctl path. There are a few places where a consistent value does not really matter, they are unchanged. Use a proper ip_ prefix for the global variable.
OK claudio@
show more ...
|
#
a1db6f2d |
| 16-May-2024 |
bluhm <bluhm@openbsd.org> |
Fix IPsec in use with IP forwarding 2 logic.
If sysctl net.inet.ip.forwarding is 2, only packets processed by IPsec are forwarded. Variable ipsec_in_use is a shortcut to avoid IPsec processing if n
Fix IPsec in use with IP forwarding 2 logic.
If sysctl net.inet.ip.forwarding is 2, only packets processed by IPsec are forwarded. Variable ipsec_in_use is a shortcut to avoid IPsec processing if no policy has been configured. With ipsec_in_use unset and ipforwarding set to IPsec only, the packet must be dropped.
OK claudio@
show more ...
|
#
ace0f189 |
| 17-Apr-2024 |
bluhm <bluhm@openbsd.org> |
Use struct ipsec_level within inpcb.
Instead of passing around u_char[4], introduce struct ipsec_level that contains 4 ipsec levels. This provides better type safety. The embedding struct inpcb is
Use struct ipsec_level within inpcb.
Instead of passing around u_char[4], introduce struct ipsec_level that contains 4 ipsec levels. This provides better type safety. The embedding struct inpcb is globally visible for netstat(1), so put struct ipsec_level outside of #ifdef _KERNEL.
OK deraadt@ mvs@
show more ...
|
#
f46106f1 |
| 09-Apr-2024 |
bluhm <bluhm@openbsd.org> |
Plug route leak in IP output.
If no struct route is passed to ip_output() or ip6_output(), it uses its own iproute on the stack. In that case any route entry in the local route cache has to be free
Plug route leak in IP output.
If no struct route is passed to ip_output() or ip6_output(), it uses its own iproute on the stack. In that case any route entry in the local route cache has to be freed. After pf decides to reroute, struct route is reset to NULL. Then the route reference counter has to be released. Call rtfree() without needless NULL check.
OK mvs@
show more ...
|
#
caa7f414 |
| 22-Feb-2024 |
bluhm <bluhm@openbsd.org> |
Make the route cache aware of multipath routing.
Pass source address to route_cache() and store it in struct route. Cached multipath routes are only valid if source address matches. If sysctl multip
Make the route cache aware of multipath routing.
Pass source address to route_cache() and store it in struct route. Cached multipath routes are only valid if source address matches. If sysctl multipath changes, increase route generation number.
OK claudio@
show more ...
|
#
94c0e2bd |
| 13-Feb-2024 |
bluhm <bluhm@openbsd.org> |
Merge struct route and struct route_in6.
Use a common struct route for both inet and inet6. Unfortunately struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has to be exposed from net/ro
Merge struct route and struct route_in6.
Use a common struct route for both inet and inet6. Unfortunately struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has to be exposed from net/route.h. Struct route has to be bsd visible for userland as netstat kvm code inspects inp_route. Internet PCB and TCP SYN cache can use a plain struct route now. All specific sockaddr types for inet and inet6 are embeded there.
OK claudio@
show more ...
|
#
029c6615 |
| 31-Jan-2024 |
bluhm <bluhm@openbsd.org> |
Add route generation number to route cache.
The outgoing route is cached at the inpcb. This cache was only invalidated when the socket closes or if the route gets invalid. More specific routes were
Add route generation number to route cache.
The outgoing route is cached at the inpcb. This cache was only invalidated when the socket closes or if the route gets invalid. More specific routes were not detected. Especially with dynamic routing protocols, sockets must be closed and reopened to use the correct route. Running ping during a route change shows the problem.
To solve this, add a route generation number that is updated whenever the routing table changes. The lookup in struct route is put into the route_cache() function. If the generation number is too old, the cached route gets discarded.
Implement route_cache() for ip_output() and ip_forward() first. IPv6 and more places will follow.
OK claudio@
show more ...
|
#
bd2b9f52 |
| 18-Jan-2024 |
claudio <claudio@openbsd.org> |
Move the rtable_exists() check into in_pcbset_rtableid(). OK bluhm@ mvs@
|
#
cd28665a |
| 01-Dec-2023 |
bluhm <bluhm@openbsd.org> |
Set inp address, port and rtable together with inpcb hash.
The inpcb hash table is protected by table->inpt_mtx. The hash is based on addresses, ports, and routing table. These fields were not syc
Set inp address, port and rtable together with inpcb hash.
The inpcb hash table is protected by table->inpt_mtx. The hash is based on addresses, ports, and routing table. These fields were not sychronized with the hash. Put writes and hash update into the same critical section. Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(), tcp_connect(), udp_disconnect() to dedicated inpcb set functions. There they use the same table mutex as in_pcbrehash(). in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work and are not included yet.
OK sashan@ mvs@
show more ...
|
#
2551e577 |
| 26-Nov-2023 |
bluhm <bluhm@openbsd.org> |
Remove inp parameter from ip_output().
ip_output() received inp as parameter. This is only used to lookup the IPsec level of the socket. Reasoning about MP locking is much easier if only relevant
Remove inp parameter from ip_output().
ip_output() received inp as parameter. This is only used to lookup the IPsec level of the socket. Reasoning about MP locking is much easier if only relevant data is passed around. Convert ip_output() to receive constant inp_seclevel as argument and mark it as protected by net lock.
OK mvs@
show more ...
|
#
5ebaba9d |
| 07-Jul-2023 |
bluhm <bluhm@openbsd.org> |
Fix path MTU discovery for TCP LRO/TSO when forwarding.
When doing LRO (Large Receive Offload), the drivers, currently ix(4) and lo(4) only, record an upper bound of the size of the original packets
Fix path MTU discovery for TCP LRO/TSO when forwarding.
When doing LRO (Large Receive Offload), the drivers, currently ix(4) and lo(4) only, record an upper bound of the size of the original packets in ph_mss. When sending, either stack or hardware must chop the packets with TSO (TCP Segmentation Offload) to that size. That means we have to call tcp_if_output_tso() before ifp->if_output(). Put that logic into if_output_tso() to avoid code duplication. As TCP packets on the wire do not get larger that way, path MTU discovery should still work.
tested by and OK jan@
show more ...
|
#
e790ea0e |
| 04-Jul-2023 |
bluhm <bluhm@openbsd.org> |
Remove redundant code when calculating checksum. OK jmatthew@
|
#
32c8d29b |
| 22-May-2023 |
bluhm <bluhm@openbsd.org> |
Fix TSO for traffic to a local address on a physical interface.
When sending TCP packets with software TSO to the local address of a physical interface, the TCP checksum was miscalculated. As the s
Fix TSO for traffic to a local address on a physical interface.
When sending TCP packets with software TSO to the local address of a physical interface, the TCP checksum was miscalculated. As the small MSS is taken from the physical interface, but the large MTU of the loopback interface is used, large TSO packets are generated, but sent directly to the loopback interface. There we need the regular pseudo header checksum and not the modified without packet length.
To avoid this confusion, use the same decision for checksum generation in in_proto_cksum_out() as for using hardware TSO in tcp_if_output_tso().
bug reported and tested by robert@ bket@ Hrvoje Popovski OK claudio@ jan@
show more ...
|
#
510f4386 |
| 15-May-2023 |
bluhm <bluhm@openbsd.org> |
Implement the TCP/IP layer for hardware TCP segmentation offload. If the driver of a network interface claims to support TSO, do not chop the packet in software, but pass it down to the interface lay
Implement the TCP/IP layer for hardware TCP segmentation offload. If the driver of a network interface claims to support TSO, do not chop the packet in software, but pass it down to the interface layer. Precalculate parts of the pseudo header checksum, but without the packet length. The length of all generated smaller packets is not known yet. Driver and hardware will use the mbuf packet header field ph_mss to calculate it and update checksum. Introduce separate flags IFCAP_TSOv4 and IFCAP_TSOv6 as hardware might support ony one protocol family. The old flag IFXF_TSO is only relevant for large receive offload. It is missnamed, but keep that for now. Note that drivers do not set TSO capabilites yet. Also the ifconfig flags and pseudo interfaces capabilities will be done separately. So this commit should not change behavior. heavily based on the work from jan@; OK sashan@
show more ...
|
#
55055d61 |
| 13-May-2023 |
bluhm <bluhm@openbsd.org> |
Instead of implementing IPv4 header checksum creation everywhere, introduce in_hdr_cksum_out(). It is used like in_proto_cksum_out(). OK claudio@
|
#
c06845b1 |
| 10-May-2023 |
bluhm <bluhm@openbsd.org> |
Implement TCP send offloading, for now in software only. This is meant as a fallback if network hardware does not support TSO. Driver support is still work in progress. TCP output generates large
Implement TCP send offloading, for now in software only. This is meant as a fallback if network hardware does not support TSO. Driver support is still work in progress. TCP output generates large packets. In IP output the packet is chopped to TCP maximum segment size. This reduces the CPU cycles used by pf. The regular output could be assisted by hardware later, but pf route-to and IPsec needs the software fallback in general. For performance comparison or to workaround possible bugs, sysctl net.inet.tcp.tso=0 disables the feature. netstat -s -p tcp shows TSO counter with chopped and generated packets. based on work from jan@ tested by jmc@ jan@ Hrvoje Popovski OK jan@ claudio@
show more ...
|
#
003cfddf |
| 08-May-2023 |
bluhm <bluhm@openbsd.org> |
The call to in_proto_cksum_out() is only needed before the packet is passed to ifp->if_output(). The fragment code has its own checksum calculation and the other paths end in goto bad. OK claudio@
|
#
b8646e37 |
| 07-May-2023 |
bluhm <bluhm@openbsd.org> |
I preparation for TSO in software, cleanup the fragment code. Use if_output_ml() to send mbuf lists to interfaces. This can be used for TSO, fragments, ARP and ND6. Rename variable fml to ml. In
I preparation for TSO in software, cleanup the fragment code. Use if_output_ml() to send mbuf lists to interfaces. This can be used for TSO, fragments, ARP and ND6. Rename variable fml to ml. In pf_route6() split the if else block. Put the safety check (hlen + firstlen < tlen) into ip_fragment(). It makes the code correct in case the packet is too short to be fragmented. This should not happen, but other functions also have this logic. No functional change. OK sashan@
show more ...
|
#
4daa6442 |
| 12-Aug-2022 |
bluhm <bluhm@openbsd.org> |
Remove differences between ip_fragment() and ip6_fragment(). They do nearly the same thing, so they should look similar. OK sashan@
|
#
a72277fe |
| 25-May-2022 |
mvs <mvs@openbsd.org> |
Call if_put(9) after we finish with `ia' within ip_getmoptions().
if_put(9) call means we finish work with `ifp' and it could be destroyed. `ia' is the pointer to 'in_ifaddr' data belongs to `ifp',
Call if_put(9) after we finish with `ia' within ip_getmoptions().
if_put(9) call means we finish work with `ifp' and it could be destroyed. `ia' is the pointer to 'in_ifaddr' data belongs to `ifp', so we need to release corresponding `ifp' after we finish deal with `ia'.
`if_addrlist' list destruction and ip_getmoptions() are serialized with kernel and net locks so this is not critical, but looks inconsistent.
ok bluhm@
show more ...
|
#
4d544115 |
| 04-Jan-2022 |
yasuoka <yasuoka@openbsd.org> |
Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and trees. ipsp_ids_lookup() returns `ids' with bumped reference counter. original diff from mvs
ok mvs
|
#
5ee194bc |
| 23-Dec-2021 |
bluhm <bluhm@openbsd.org> |
IPsec is not MP safe yet. To allow forwarding in parallel without dirty hacks, it is better to protect IPsec input and output with kernel lock. Not much is lost as crypto needs the kernel lock anyw
IPsec is not MP safe yet. To allow forwarding in parallel without dirty hacks, it is better to protect IPsec input and output with kernel lock. Not much is lost as crypto needs the kernel lock anyway. From here we can refine the lock later. Note that there is no kernel lock in the SPD lockup path. Goal is to keep that lock free to allow fast forwarding with non IPsec traffic. tested by Hrvoje Popovski; OK tobhe@
show more ...
|
#
d997d144 |
| 20-Dec-2021 |
mvs <mvs@openbsd.org> |
Use per-CPU counters for tunnel descriptor block (TDB) statistics. 'tdb_data' struct became unused and was removed.
Tested by Hrvoje Popovski. ok bluhm@
|
#
31a6915f |
| 03-Dec-2021 |
bluhm <bluhm@openbsd.org> |
Add TDB reference counting to ipsp_spd_lookup(). If an output pointer is passed to the function, it will return a refcounted TDB. The ref happens when ipsp_spd_inp() copies the pointer from ipo->ipo
Add TDB reference counting to ipsp_spd_lookup(). If an output pointer is passed to the function, it will return a refcounted TDB. The ref happens when ipsp_spd_inp() copies the pointer from ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after using it. tested by Hrvoje Popovski; OK mvs@ tobhe@
show more ...
|