History log of /dragonfly/sys/netinet/ip_output.c (Results 1 – 25 of 109)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 2976dea7 03-Mar-2024 Aaron LI <aly@aaronly.me>

sys: Minor fixes to some comments


# 410f8572 22-Dec-2023 Aaron LI <aly@aaronly.me>

kernel: Replace the deprecated m_copy() with m_copym()


# 2b3f93ea 13-Oct-2023 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add per-process capability-based restrictions

* This new system allows userland to set capability restrictions which
turns off numerous kernel features and root accesses. These restricti

kernel - Add per-process capability-based restrictions

* This new system allows userland to set capability restrictions which
turns off numerous kernel features and root accesses. These restrictions
are inherited by sub-processes recursively. Once set, restrictions cannot
be removed.

Basic restrictions that mimic an unadorned jail can be enabled without
creating a jail, but generally speaking real security also requires
creating a chrooted filesystem topology, and a jail is still needed
to really segregate processes from each other. If you do so, however,
you can (for example) disable mount/umount and most global root-only
features.

* Add new system calls and a manual page for syscap_get(2) and syscap_set(2)

* Add sys/caps.h

* Add the "setcaps" userland utility and manual page.

* Remove priv.9 and the priv_check infrastructure, replacing it with
a newly designed caps infrastructure.

* The intention is to add path restriction lists and similar features to
improve jailess security in the near future, and to optimize the
priv_check code.

show more ...


# 8a93af2a 08-Jul-2023 Matthew Dillon <dillon@apollo.backplane.com>

network - Remove host-order translations of ipv4 ip_off and ip_len

* Do not translate ip_off and ip_len to host order and then back again
in the network stack. The fields are now left in network

network - Remove host-order translations of ipv4 ip_off and ip_len

* Do not translate ip_off and ip_len to host order and then back again
in the network stack. The fields are now left in network order.

show more ...


Revision tags: v6.4.0, v6.4.0rc1, v6.5.0, v6.2.2, v6.2.1, v6.2.0, v6.3.0, v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0
# 1926f587 07-Dec-2020 Sepherosa Ziehau <sephe@dragonflybsd.org>

ip: Allow ip_mreqn support for IP_MULTICAST_IF,IP_{ADD,DROP}_MEMBERSHIP

- ip_mreqn.imr_ifindex takes precendence over ip_mreqn.imr_address,
if it's not 0. This strictly follows what Linux does.
-

ip: Allow ip_mreqn support for IP_MULTICAST_IF,IP_{ADD,DROP}_MEMBERSHIP

- ip_mreqn.imr_ifindex takes precendence over ip_mreqn.imr_address,
if it's not 0. This strictly follows what Linux does.
- Allow ip_mreq for IP_MULTICAST_IF as what Linux does.

While I'm here, remove unnecessary critical section among
IP_MULTICAST_IF, IP_{ADD,DROP}_MEMBERSHIP.

Bump kernel version.

Requested-by: tuxillo

show more ...


# 544d23f4 04-Dec-2020 Sepherosa Ziehau <sephe@dragonflybsd.org>

inet: Port IPPROTO/IP_RECVTOS from FreeBSD.

Bump kernel version.

Requested-by: zrj


Revision tags: v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3, v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1
# 755d70b8 21-Apr-2018 Sascha Wildner <saw@online.de>

Remove IPsec and related code from the system.

It was unmaintained ever since we inherited it from FreeBSD 4.8.

In fact, we had two implementations from that time: IPSEC and FAST_IPSEC.
FAST_IPSEC

Remove IPsec and related code from the system.

It was unmaintained ever since we inherited it from FreeBSD 4.8.

In fact, we had two implementations from that time: IPSEC and FAST_IPSEC.
FAST_IPSEC is the implementation to which FreeBSD has moved since, but
it didn't even build in DragonFly.

Fixes for dports have been committed to DeltaPorts.

Requested-by: dillon
Dports-testing-and-fixing: zrj

show more ...


Revision tags: v5.2.0, v5.3.0, v5.2.0rc, v5.0.2
# 06937ef9 25-Nov-2017 Sascha Wildner <saw@online.de>

Remove faith(4) and faithd(8) from the tree.

FreeBSD did that 3 years ago (r274331). Quoting from their commit msg:

-----8<-----
It looks like industry have chosen different (and more traditional)

Remove faith(4) and faithd(8) from the tree.

FreeBSD did that 3 years ago (r274331). Quoting from their commit msg:

-----8<-----
It looks like industry have chosen different (and more traditional)
stateless/stateful NAT64 as translation mechanism. Last non-trivial
commits to both faith(4) and faithd(8) happened more than 12 years
ago, so I assume it is time to drop RFC3142 in FreeBSD.
----->8-----

Some more info here:

https://lists.freebsd.org/pipermail/freebsd-net/2014-October/040224.html

Discussed-with: sephe

show more ...


Revision tags: v5.0.1, v5.0.0, v5.0.0rc2
# e622598e 30-Sep-2017 Sepherosa Ziehau <sephe@dragonflybsd.org>

ipfw: Implement state based "redirect", i.e. without using libalias.

Redirection creates two states, i.e. one before the translation (xlat0)
and one after the translation (xlat1). If the hash of th

ipfw: Implement state based "redirect", i.e. without using libalias.

Redirection creates two states, i.e. one before the translation (xlat0)
and one after the translation (xlat1). If the hash of the translated
packet indicates that it is owned by a remote CPU:
- If the packet triggers the state pair creation, the 'xlat1' will be
piggybacked by the translated packet, which will be forwarded to the
remote CPU for further evalution. And the 'xlat1' will be installed
on the remote CPU before the evalution of the translated packet.
- Else only the translated packet will be forwarded to the remote CPU
for further evalution.

The 'xlat1' is called the slave state, which will be deleted only when
the 'xlat0' (the master state) is deleted. The state pair is always
deleted on the CPU owning the 'xlat1'; the 'xlat0' will be forwarded
there.

The reference counting of the state pair is maintained independently
in each state, the memory of the state pair will be freed only after
the sum of the counter in each state reaches 0. This avoids expensive
per-packet atomic ops.

As far as I have tested, this implementation of "redirect" does _not_
introduce any noticeable performance reduction, latency increasing or
latency destability.

This commit makes most of the necessary bits for NAT ready too.

show more ...


Revision tags: v5.1.0, v5.0.0rc1
# 5204e13c 07-Aug-2017 Sepherosa Ziehau <sephe@dragonflybsd.org>

netisr: Simplify assertion related bits


# 918e8ca3 03-Aug-2017 Sepherosa Ziehau <sephe@dragonflybsd.org>

inet: ip_{output/input}() should only run in first netisr_ncpus netisrs


Revision tags: v4.8.1
# 860b6b42 20-Jun-2017 Sepherosa Ziehau <sephe@dragonflybsd.org>

loopback: Use ifclone APIs to create loopback interfaces.

This paves way for multiple FIB support.


Revision tags: v4.8.0, v4.6.2, v4.9.0, v4.8.0rc, v4.6.1
# afd2da4d 03-Aug-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Remove PG_ZERO and zeroidle (page-zeroing) entirely

* Remove the PG_ZERO flag and remove all page-zeroing optimizations,
entirely. Aftering doing a substantial amount of testing, these

kernel - Remove PG_ZERO and zeroidle (page-zeroing) entirely

* Remove the PG_ZERO flag and remove all page-zeroing optimizations,
entirely. Aftering doing a substantial amount of testing, these
optimizations, which existed all the way back to CSRG BSD, no longer
provide any benefit on a modern system.

- Pre-zeroing a page only takes 80ns on a modern cpu. vm_fault overhead
in general is ~at least 1 microscond.

- Pre-zeroing a page leads to a cold-cache case on-use, forcing the fault
source (e.g. a userland program) to actually get the data from main
memory in its likely immediate use of the faulted page, reducing
performance.

- Zeroing the page at fault-time is actually more optimal because it does
not require any reading of dynamic ram and leaves the cache hot.

- Multiple synth and build tests show that active idle-time zeroing of
pages actually reduces performance somewhat and incidental allocations
of already-zerod pages (from page-table tear-downs) do not affect
performance in any meaningful way.

* Remove bcopyi() and obbcopy() -> collapse into bcopy(). These other
versions existed because bcopy() used to be specially-optimized and
could not be used in all situations. That is no longer true.

* Remove bcopy function pointer argument to m_devget(). It is no longer
used. This function existed to help support ancient drivers which might
have needed a special memory copy to read and write mapped data. It has
long been supplanted by BUSDMA.

show more ...


Revision tags: v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0, v4.4.3, v4.4.2
# ef6b578f 12-Dec-2015 Sepherosa Ziehau <sephe@dragonflybsd.org>

inet/mcast: Don't free inp_moptions in ip_setmoptions()

This memory foot print optimization does not save much memory for
us, but it could cause a lot of trouble for in_pcbladdr_find().


Revision tags: v4.4.1, v4.4.0, v4.5.0, v4.4.0rc, v4.2.4, v4.3.1, v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc, v4.0.5
# 9b975f11 16-Mar-2015 Sepherosa Ziehau <sephe@dragonflybsd.org>

ip: Don't generate IP ID for DF IP datagrams (part of RFC6864)


Revision tags: v4.0.4
# b5523eac 19-Feb-2015 Sascha Wildner <saw@online.de>

kernel: Move us to using M_NOWAIT and M_WAITOK for mbuf functions.

The main reason is that our having to use the MB_WAIT and MB_DONTWAIT
flags was a recurring issue when porting drivers from FreeBSD

kernel: Move us to using M_NOWAIT and M_WAITOK for mbuf functions.

The main reason is that our having to use the MB_WAIT and MB_DONTWAIT
flags was a recurring issue when porting drivers from FreeBSD because
it tended to get forgotten and the code would compile anyway with the
wrong constants. And since MB_WAIT and MB_DONTWAIT ended up as ocflags
for an objcache_get() or objcache_reclaimlist call (which use M_WAITOK
and M_NOWAIT), it was just one big converting back and forth with some
sanitization in between.

This commit allows M_* again for the mbuf functions and keeps the
sanitizing as it was before: when M_WAITOK is among the passed flags,
objcache functions will be called with M_WAITOK and when it is absent,
they will be called with M_NOWAIT. All other flags are scrubbed by the
MB_OCFLAG() macro which does the same as the former MBTOM().

Approved-by: dillon

show more ...


# b4051e25 22-Jan-2015 Sepherosa Ziehau <sephe@dragonflybsd.org>

ifnet: Make ifnet and ifindex2ifnet MPSAFE

- Accessing to these two global variables from non-netisr threads uses
ifnet lock. This kind of accessing is from
- Accessing to ifindex2ifnet from neti

ifnet: Make ifnet and ifindex2ifnet MPSAFE

- Accessing to these two global variables from non-netisr threads uses
ifnet lock. This kind of accessing is from
- Accessing to ifindex2ifnet from netisrs are lockless MPSAFE.
- Netisrs no longer access ifnet, instead they access ifnet array as of
this commit, which is lockless MPSAFE.

Rules for accessing ifnet and ifindex2ifnet is commented near the
declaration of the related global variables/functions in net/if_var.h.

show more ...


Revision tags: v4.0.3, v4.0.2, v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0
# 8ba7dcb1 25-Sep-2014 Sepherosa Ziehau <sephe@dragonflybsd.org>

socket: Provide socket owner cpuid hint

If the cpuid hint could not be provided or the cpuid hint does not make
sense, -1 will be returned as cpuid hint, e.g. TCP listen sockets w/o
SO_REUSEPORT.

T

socket: Provide socket owner cpuid hint

If the cpuid hint could not be provided or the cpuid hint does not make
sense, -1 will be returned as cpuid hint, e.g. TCP listen sockets w/o
SO_REUSEPORT.

This helps avoiding unnecessary IPIs and contention on receiving sockbuf
token.

show more ...


Revision tags: v3.8.2
# be4519a2 03-Jul-2014 Sepherosa Ziehau <sephe@dragonflybsd.org>

udp: Make udp pcbinfo and portinfo per-cpu; greatly improve performance

MAJOR CHANGES:

- Add token to protect pcbinfo's inpcb list and wildcard hash table.
Currently only udp per-cpu pcbinfo sets

udp: Make udp pcbinfo and portinfo per-cpu; greatly improve performance

MAJOR CHANGES:

- Add token to protect pcbinfo's inpcb list and wildcard hash table.
Currently only udp per-cpu pcbinfo sets this token. udp serializer
and netisr barrier are nuked.

o udp inpcb list:

Under most cases, udp inpcb list is operated in its owner netisr.
However, it is also accessed and modified (no effiective udp inpcb
will be unlinked though) in netisr0 to adjust multicast options if
one interface is to be detached. So protecting udp inpcb list
accessing and modification w/ token is necessary.

At udp inpcb detach time, the udp inpcb is first removed from the
udp inpcb list, then a message will go through all netisrs, which
makes sure that no netisrs are using or can find this udp inpcb
from the udp inpcb list. After all these, this udp inpcb is
destroyed in its owner netisr.

In netisrs, it is MP safe to find a udp inpcb from udp inpcb list,
then release the token and process the found udp inpcb.

In other threads, it is MP safe to find a udp inpcb from udp inpcb
list, then release the token and process the found udp inpcb in
non-blocking fashion.

See also the usage of inpcb marker.

o udp wildcard hash table:

On input path, udp wildcard hash table is searched in its owner
netisr. In order to ease implicit binding (bind during send),
connect after binding, and disconnect, udp inpcb are inserted
into and removed from other udp pcbinfos' wildcard hash table in
its owner netisr. Thus the udp wildcard hash table must be
protected w/ token.

At udp inpcb detach time, a message will go through all netisrs,
and this udp inpcb will be removed from the udp wildcard hash
table belonging to the current netisr. This makes sure that once
the current netisr runs the message handler, this udp inpcb will
not be used and be found in the current netisr. When the message
reaches the last netisr, this udp inpcb is redispatched to its
owner netisr to be destroyed.

In netisrs, it is MP safe to find a udp inpcb from udp wildcard
hash table, then release the token and process the found udp inpcb,
e.g. use udp inpcb found by in_pcblookuphash().

In other threads, it is MP safe to find a udp inpcb from udp
wildcard hash table, then release the token and process the found
udp inpcb in non-blocking fashion.

See also the usage of inpcb container marker.

o udp connect hash table:

It is lockless MP safe, and only accessed and modified in its owner
netisr.

- During inpcb iteration through inpcb list, use inpcb marker when
calling functions, which may block, e.g. in_pcbpurgeif0(), so the
inpcb iteration will not stop prematurely, if the inpcb being
processed is removed from the inpcb list.

- Use udp inpcb wildcard table and udp inpcb connect hash table to
dispatch input multicast and broadcast udp datagrams. Using udp inpcb
list could be time consume, since we need to check udp inpcb lists on
all cpus; and secondly, once udp inpcb has a local port, it will be in
either udp wildcard hash table or udp connect hash table.

Since the socket buffer operation on input path may block, inpcb
container marker is used when iterating inpcbs from udp inpcb wildcard
hash table. in_pcblookup_pkthash() is adjusted to skip inpcb
container marker.

- udp socket so_port is no longer fixed to netisr0 msgport
o Initial udp socket so_port is the current cpu's netisr msgport.
o Bound but unconnected udp socket so_port is selected according to
local port hash.
o Connected udp socket so_port is selected according to the udp hash,
i.e. laddr/faddr toeplitz hash (exception: multicast laddr or
multicast faddr, is hashed to netisr0).
o Multicast socket options are forced to be handled in netisr0, since
udp socket so_port may not be netisr0 msgport.

- In order to support asynchronized udp inpcb detach:
o EJUSTRETURN from pru_detach method now means protocol will call
sodiscard() and sofree() for soclose(). udp pru_detach method
returns EJUSTRETURN as of this commit.
o SS_ISCLOSING socket state is set before calling pru_detach method,
so protocol could avoid certain expensive, unnecessary or
disallowed operation in pru_disconnect or pru_detach method, e.g.
udp pru_disconnect method avoids putting udp inpcb back to udp
wildcard hash table, if SS_ISCLOSING is set.

MISC CHANGES:

- pcbinfo's cpu id must be set now; -1 is disallowed.
- udp pru_abort method should never be called; it panicks now.
- Restore traditional BSD behaviour, if unbound udp socket connect
fails: if local port of the udp socket has been selected, its inpcb
should be in wildcard hash table, i.e. the udp inpcb should be visible
on udp datagrams input path.
- Make sure multicast stuffs are adjusted only in netisr0 for inet6, if
one interface is about to be detached.

PERFORMANCE IMPROVEMENT:

For 'kq_connect_client -u' test, this commit gives 400% performance
improvement (31Kconns/s -> 160Kconns/s).

show more ...


Revision tags: v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc
# 3b07180d 19-May-2014 Sepherosa Ziehau <sephe@dragonflybsd.org>

ip_output: if_simloop is MPSAFE


# 72659ed0 13-May-2014 Sepherosa Ziehau <sephe@dragonflybsd.org>

ifnet: Properly protect if_multiaddrs using ifnet serializers

- Protect ifnet.if_multiaddrs using ifnet serializers. Add some
comment in the places, where only main serailizer is necessary.
- Fix

ifnet: Properly protect if_multiaddrs using ifnet serializers

- Protect ifnet.if_multiaddrs using ifnet serializers. Add some
comment in the places, where only main serailizer is necessary.
- Fix if_delallmulti(). Using TAILQ_FOREACH_MUTABLE is incorrect for
deleting an ifmultiaddr from ifnet.if_multiaddrs. Since deleting one
ifmultiaddr may cause additional ifmultiaddr deletion (e.g. the AF_LINK
ifmultiaddr for AF_INET ifmultiaddr).
- Change IN_LOOKUP_MULTI and IN6_LOOKUP_MULTI macros into inline
functions.
- Redispatch multicast IP packets to netisr0 for further processing.
Software based IP packet hash function is changed. And hash value
fixup for multicast IP packets is added to the beginning of ip_input();
this is mainly for IP packets, whose hash is calculated by hardware.
- For wlan's multicast hardware filter updating, we no longer need to
release wlan serializer and mess up w/ the if_ioctl setting.

In netisr0, read and test ifma_refcount for AF_INET ifmultiaddr is MPSAFE
w/o ifnet serializers, since its ifma_refcount is only altered in netisr0.

In netisr0, any operation on in_multi, which is obtained from the
corresponding ifmuliaddr's ifma_protospec, is MPSAFE w/o ifnet
serializers, since ifmultiaddr for AF_INET is only set and cleared in
netisr0.

While I'm here also redispatch IP packets w/o hash to the proper netisrs,
on ip_input() path. And unnecessary critical sections in
in_{add,del}multi() are removed.

show more ...


Revision tags: v3.6.2, v3.6.1, v3.6.0, v3.7.1, v3.6.0rc, v3.7.0, v3.4.3, v3.4.2
# 5b8afb6b 09-May-2013 Sepherosa Ziehau <sephe@dragonflybsd.org>

ip_output: Always panic if the rtentry is not owned by the current cpu

It has been on for several releases; make it mandatory now


Revision tags: v3.4.0, v3.4.1
# 2e585ead 09-Apr-2013 Sepherosa Ziehau <sephe@dragonflybsd.org>

ip/udp: Fix IP source address setting for multicast address bound socket

It is a common practice to bind UDP socket to multicast address to enjoy
kernel level destination multicast address and port

ip/udp: Fix IP source address setting for multicast address bound socket

It is a common practice to bind UDP socket to multicast address to enjoy
kernel level destination multicast address and port filtering. However,
if data are sent on this kind of socket, source address of the IP packet
will be the bound multicast address?!

Two fixes are added to address this bug:

1) Don't set IP source address in udp_output(), if the inpcb's laddr is
multicast address. Instead the IP source address is set to INADDR_ANY,
so ip_output() could pick up a proper IP source address.

2) With 1) in place, it is possible that IP source address is INADDR_ANY
before the ifnet.if_output() using following steps:
- If the IP_MULTICAST_IF socket option is set to iface0
- The iface0's last IP address is unset, before the ip_output()

This condition could easily be reproduced by using test/mcast:
mcast -m 224.2.2.2 -p 3000 -i iface0_ip -D 10
During the 10sec delay, wipe out all IP addresses from iface0

Well, even without 1), raw IP still could generate IP packet using
INADDR_ANY as source address.

Two checks on the source IP address are added to ip_output() before
ifnet.if_output()
- IP source address should not be INADDR_ANY
- IP source address should not be multicast address

And for multicast IP packets, if the IP source address could be
determined, they will not be looped back and forwarded.

Reported-by: zeroxia

show more ...


# 8d7e0714 09-Apr-2013 Sepherosa Ziehau <sephe@dragonflybsd.org>

ip_output: Record the "src was INADDR_ANY" for multicast packets

While I'm here, fix the comment on the unicast packet output path.


Revision tags: v3.4.0rc, v3.5.0
# d40991ef 13-Feb-2013 Sepherosa Ziehau <sephe@dragonflybsd.org>

if: Per-cpu ifnet/ifaddr statistics, step 1/3

Wrap ifnet/ifaddr stats updating, setting and extraction into macros;
ease upcoming changes.


12345