History log of /openbsd/sys/netinet/ip_output.c (Results 1 – 25 of 401)
Revision Date Author Comments
# 0e25137a 02-Jul-2024 bluhm <bluhm@openbsd.org>

Read IPsec forwarding information once.

Fix MP race between reading ip_forwarding in ip_input() and checking
ip_forwarding == 2 in ip_output(). In theory ip_forwarding could
be 2 during ip_input()

Read IPsec forwarding information once.

Fix MP race between reading ip_forwarding in ip_input() and checking
ip_forwarding == 2 in ip_output(). In theory ip_forwarding could
be 2 during ip_input() and later 0 in ip_output(). Then a packet
would be forwarded that was never allowed. Currently exclusive
netlock in sysctl(2) prevents all races.

Introduce IP_FORWARDING_IPSEC and pass it with the flags parameter
that was introduced for IP_FORWARDING.

Instead of calling m_tag_find(), traversing the list, and comparing
with NULL, just check the PACKET_TAG_IPSEC_IN_DONE bit. Reading
ipsec_in_use in ip_output() is a performance hack that is not
necessary. New code only checks tree bits.

OK mvs@

show more ...


# 84b2c343 07-Jun-2024 bluhm <bluhm@openbsd.org>

Read IP forwarding variables only once.

Do not assume that ip_forwarding and ip_directedbcast cannot change
while processing one packet. Read it once and pass down its value
with a flag. This is n

Read IP forwarding variables only once.

Do not assume that ip_forwarding and ip_directedbcast cannot change
while processing one packet. Read it once and pass down its value
with a flag. This is necessary for unlocking the sysctl path.
There are a few places where a consistent value does not really
matter, they are unchanged. Use a proper ip_ prefix for the global
variable.

OK claudio@

show more ...


# a1db6f2d 16-May-2024 bluhm <bluhm@openbsd.org>

Fix IPsec in use with IP forwarding 2 logic.

If sysctl net.inet.ip.forwarding is 2, only packets processed by
IPsec are forwarded. Variable ipsec_in_use is a shortcut to avoid
IPsec processing if n

Fix IPsec in use with IP forwarding 2 logic.

If sysctl net.inet.ip.forwarding is 2, only packets processed by
IPsec are forwarded. Variable ipsec_in_use is a shortcut to avoid
IPsec processing if no policy has been configured. With ipsec_in_use
unset and ipforwarding set to IPsec only, the packet must be dropped.

OK claudio@

show more ...


# ace0f189 17-Apr-2024 bluhm <bluhm@openbsd.org>

Use struct ipsec_level within inpcb.

Instead of passing around u_char[4], introduce struct ipsec_level
that contains 4 ipsec levels. This provides better type safety.
The embedding struct inpcb is

Use struct ipsec_level within inpcb.

Instead of passing around u_char[4], introduce struct ipsec_level
that contains 4 ipsec levels. This provides better type safety.
The embedding struct inpcb is globally visible for netstat(1), so
put struct ipsec_level outside of #ifdef _KERNEL.

OK deraadt@ mvs@

show more ...


# f46106f1 09-Apr-2024 bluhm <bluhm@openbsd.org>

Plug route leak in IP output.

If no struct route is passed to ip_output() or ip6_output(), it
uses its own iproute on the stack. In that case any route entry
in the local route cache has to be free

Plug route leak in IP output.

If no struct route is passed to ip_output() or ip6_output(), it
uses its own iproute on the stack. In that case any route entry
in the local route cache has to be freed. After pf decides to
reroute, struct route is reset to NULL. Then the route reference
counter has to be released. Call rtfree() without needless NULL
check.

OK mvs@

show more ...


# caa7f414 22-Feb-2024 bluhm <bluhm@openbsd.org>

Make the route cache aware of multipath routing.

Pass source address to route_cache() and store it in struct route.
Cached multipath routes are only valid if source address matches.
If sysctl multip

Make the route cache aware of multipath routing.

Pass source address to route_cache() and store it in struct route.
Cached multipath routes are only valid if source address matches.
If sysctl multipath changes, increase route generation number.

OK claudio@

show more ...


# 94c0e2bd 13-Feb-2024 bluhm <bluhm@openbsd.org>

Merge struct route and struct route_in6.

Use a common struct route for both inet and inet6. Unfortunately
struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has
to be exposed from net/ro

Merge struct route and struct route_in6.

Use a common struct route for both inet and inet6. Unfortunately
struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has
to be exposed from net/route.h. Struct route has to be bsd visible
for userland as netstat kvm code inspects inp_route. Internet PCB
and TCP SYN cache can use a plain struct route now. All specific
sockaddr types for inet and inet6 are embeded there.

OK claudio@

show more ...


# 029c6615 31-Jan-2024 bluhm <bluhm@openbsd.org>

Add route generation number to route cache.

The outgoing route is cached at the inpcb. This cache was only
invalidated when the socket closes or if the route gets invalid.
More specific routes were

Add route generation number to route cache.

The outgoing route is cached at the inpcb. This cache was only
invalidated when the socket closes or if the route gets invalid.
More specific routes were not detected. Especially with dynamic
routing protocols, sockets must be closed and reopened to use the
correct route. Running ping during a route change shows the problem.

To solve this, add a route generation number that is updated whenever
the routing table changes. The lookup in struct route is put into
the route_cache() function. If the generation number is too old,
the cached route gets discarded.

Implement route_cache() for ip_output() and ip_forward() first.
IPv6 and more places will follow.

OK claudio@

show more ...


# bd2b9f52 18-Jan-2024 claudio <claudio@openbsd.org>

Move the rtable_exists() check into in_pcbset_rtableid().
OK bluhm@ mvs@


# cd28665a 01-Dec-2023 bluhm <bluhm@openbsd.org>

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not syc

Set inp address, port and rtable together with inpcb hash.

The inpcb hash table is protected by table->inpt_mtx. The hash is
based on addresses, ports, and routing table. These fields were
not sychronized with the hash. Put writes and hash update into the
same critical section.
Move the updates from ip_ctloutput(), ip6_ctloutput(), syn_cache_get(),
tcp_connect(), udp_disconnect() to dedicated inpcb set functions.
There they use the same table mutex as in_pcbrehash().
in_pcbbind(), in_pcbconnect(), and in6_pcbconnect() need more work
and are not included yet.

OK sashan@ mvs@

show more ...


# 2551e577 26-Nov-2023 bluhm <bluhm@openbsd.org>

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant

Remove inp parameter from ip_output().

ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.

OK mvs@

show more ...


# 5ebaba9d 07-Jul-2023 bluhm <bluhm@openbsd.org>

Fix path MTU discovery for TCP LRO/TSO when forwarding.

When doing LRO (Large Receive Offload), the drivers, currently ix(4)
and lo(4) only, record an upper bound of the size of the original
packets

Fix path MTU discovery for TCP LRO/TSO when forwarding.

When doing LRO (Large Receive Offload), the drivers, currently ix(4)
and lo(4) only, record an upper bound of the size of the original
packets in ph_mss. When sending, either stack or hardware must
chop the packets with TSO (TCP Segmentation Offload) to that size.
That means we have to call tcp_if_output_tso() before ifp->if_output().
Put that logic into if_output_tso() to avoid code duplication. As
TCP packets on the wire do not get larger that way, path MTU discovery
should still work.

tested by and OK jan@

show more ...


# e790ea0e 04-Jul-2023 bluhm <bluhm@openbsd.org>

Remove redundant code when calculating checksum.
OK jmatthew@


# 32c8d29b 22-May-2023 bluhm <bluhm@openbsd.org>

Fix TSO for traffic to a local address on a physical interface.

When sending TCP packets with software TSO to the local address of
a physical interface, the TCP checksum was miscalculated. As the
s

Fix TSO for traffic to a local address on a physical interface.

When sending TCP packets with software TSO to the local address of
a physical interface, the TCP checksum was miscalculated. As the
small MSS is taken from the physical interface, but the large MTU
of the loopback interface is used, large TSO packets are generated,
but sent directly to the loopback interface. There we need the
regular pseudo header checksum and not the modified without packet
length.

To avoid this confusion, use the same decision for checksum generation
in in_proto_cksum_out() as for using hardware TSO in tcp_if_output_tso().

bug reported and tested by robert@ bket@ Hrvoje Popovski
OK claudio@ jan@

show more ...


# 510f4386 15-May-2023 bluhm <bluhm@openbsd.org>

Implement the TCP/IP layer for hardware TCP segmentation offload.
If the driver of a network interface claims to support TSO, do not
chop the packet in software, but pass it down to the interface
lay

Implement the TCP/IP layer for hardware TCP segmentation offload.
If the driver of a network interface claims to support TSO, do not
chop the packet in software, but pass it down to the interface
layer.
Precalculate parts of the pseudo header checksum, but without the
packet length. The length of all generated smaller packets is not
known yet. Driver and hardware will use the mbuf packet header
field ph_mss to calculate it and update checksum.
Introduce separate flags IFCAP_TSOv4 and IFCAP_TSOv6 as hardware
might support ony one protocol family. The old flag IFXF_TSO is
only relevant for large receive offload. It is missnamed, but keep
that for now.
Note that drivers do not set TSO capabilites yet. Also the ifconfig
flags and pseudo interfaces capabilities will be done separately.
So this commit should not change behavior.
heavily based on the work from jan@; OK sashan@

show more ...


# 55055d61 13-May-2023 bluhm <bluhm@openbsd.org>

Instead of implementing IPv4 header checksum creation everywhere,
introduce in_hdr_cksum_out(). It is used like in_proto_cksum_out().
OK claudio@


# c06845b1 10-May-2023 bluhm <bluhm@openbsd.org>

Implement TCP send offloading, for now in software only. This is
meant as a fallback if network hardware does not support TSO. Driver
support is still work in progress. TCP output generates large

Implement TCP send offloading, for now in software only. This is
meant as a fallback if network hardware does not support TSO. Driver
support is still work in progress. TCP output generates large
packets. In IP output the packet is chopped to TCP maximum segment
size. This reduces the CPU cycles used by pf. The regular output
could be assisted by hardware later, but pf route-to and IPsec needs
the software fallback in general.
For performance comparison or to workaround possible bugs, sysctl
net.inet.tcp.tso=0 disables the feature. netstat -s -p tcp shows
TSO counter with chopped and generated packets.
based on work from jan@
tested by jmc@ jan@ Hrvoje Popovski
OK jan@ claudio@

show more ...


# 003cfddf 08-May-2023 bluhm <bluhm@openbsd.org>

The call to in_proto_cksum_out() is only needed before the packet
is passed to ifp->if_output(). The fragment code has its own
checksum calculation and the other paths end in goto bad.
OK claudio@


# b8646e37 07-May-2023 bluhm <bluhm@openbsd.org>

I preparation for TSO in software, cleanup the fragment code. Use
if_output_ml() to send mbuf lists to interfaces. This can be used
for TSO, fragments, ARP and ND6. Rename variable fml to ml. In

I preparation for TSO in software, cleanup the fragment code. Use
if_output_ml() to send mbuf lists to interfaces. This can be used
for TSO, fragments, ARP and ND6. Rename variable fml to ml. In
pf_route6() split the if else block. Put the safety check (hlen +
firstlen < tlen) into ip_fragment(). It makes the code correct in
case the packet is too short to be fragmented. This should not
happen, but other functions also have this logic.
No functional change. OK sashan@

show more ...


# 4daa6442 12-Aug-2022 bluhm <bluhm@openbsd.org>

Remove differences between ip_fragment() and ip6_fragment(). They
do nearly the same thing, so they should look similar.
OK sashan@


# a72277fe 25-May-2022 mvs <mvs@openbsd.org>

Call if_put(9) after we finish with `ia' within ip_getmoptions().

if_put(9) call means we finish work with `ifp' and it could be destroyed.
`ia' is the pointer to 'in_ifaddr' data belongs to `ifp',

Call if_put(9) after we finish with `ia' within ip_getmoptions().

if_put(9) call means we finish work with `ifp' and it could be destroyed.
`ia' is the pointer to 'in_ifaddr' data belongs to `ifp', so we need to
release corresponding `ifp' after we finish deal with `ia'.

`if_addrlist' list destruction and ip_getmoptions() are serialized with
kernel and net locks so this is not critical, but looks inconsistent.

ok bluhm@

show more ...


# 4d544115 04-Jan-2022 yasuoka <yasuoka@openbsd.org>

Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and
trees. ipsp_ids_lookup() returns `ids' with bumped reference
counter. original diff from mvs

ok mvs


# 5ee194bc 23-Dec-2021 bluhm <bluhm@openbsd.org>

IPsec is not MP safe yet. To allow forwarding in parallel without
dirty hacks, it is better to protect IPsec input and output with
kernel lock. Not much is lost as crypto needs the kernel lock
anyw

IPsec is not MP safe yet. To allow forwarding in parallel without
dirty hacks, it is better to protect IPsec input and output with
kernel lock. Not much is lost as crypto needs the kernel lock
anyway. From here we can refine the lock later.
Note that there is no kernel lock in the SPD lockup path. Goal is
to keep that lock free to allow fast forwarding with non IPsec
traffic.
tested by Hrvoje Popovski; OK tobhe@

show more ...


# d997d144 20-Dec-2021 mvs <mvs@openbsd.org>

Use per-CPU counters for tunnel descriptor block (TDB) statistics.
'tdb_data' struct became unused and was removed.

Tested by Hrvoje Popovski.
ok bluhm@


# 31a6915f 03-Dec-2021 bluhm <bluhm@openbsd.org>

Add TDB reference counting to ipsp_spd_lookup(). If an output
pointer is passed to the function, it will return a refcounted TDB.
The ref happens when ipsp_spd_inp() copies the pointer from
ipo->ipo

Add TDB reference counting to ipsp_spd_lookup(). If an output
pointer is passed to the function, it will return a refcounted TDB.
The ref happens when ipsp_spd_inp() copies the pointer from
ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after
using it.
tested by Hrvoje Popovski; OK mvs@ tobhe@

show more ...


12345678910>>...17