ip_output.c - OpenGrok history log for /dragonfly/sys/netinet/ip

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# 2976dea7	03-Mar-2024	Aaron LI <aly@aaronly.me>	sys: Minor fixes to some comments
# 410f8572	22-Dec-2023	Aaron LI <aly@aaronly.me>	kernel: Replace the deprecated m_copy() with m_copym()
# 2b3f93ea	13-Oct-2023	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Add per-process capability-based restrictions * This new system allows userland to set capability restrictions which turns off numerous kernel features and root accesses. These restricti kernel - Add per-process capability-based restrictions * This new system allows userland to set capability restrictions which turns off numerous kernel features and root accesses. These restrictions are inherited by sub-processes recursively. Once set, restrictions cannot be removed. Basic restrictions that mimic an unadorned jail can be enabled without creating a jail, but generally speaking real security also requires creating a chrooted filesystem topology, and a jail is still needed to really segregate processes from each other. If you do so, however, you can (for example) disable mount/umount and most global root-only features. * Add new system calls and a manual page for syscap_get(2) and syscap_set(2) * Add sys/caps.h * Add the "setcaps" userland utility and manual page. * Remove priv.9 and the priv_check infrastructure, replacing it with a newly designed caps infrastructure. * The intention is to add path restriction lists and similar features to improve jailess security in the near future, and to optimize the priv_check code. show more ...
# 8a93af2a	08-Jul-2023	Matthew Dillon <dillon@apollo.backplane.com>	network - Remove host-order translations of ipv4 ip_off and ip_len * Do not translate ip_off and ip_len to host order and then back again in the network stack. The fields are now left in network network - Remove host-order translations of ipv4 ip_off and ip_len * Do not translate ip_off and ip_len to host order and then back again in the network stack. The fields are now left in network order. show more ...
Revision tags: v6.4.0, v6.4.0rc1, v6.5.0, v6.2.2, v6.2.1, v6.2.0, v6.3.0, v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0
# 1926f587	07-Dec-2020	Sepherosa Ziehau <sephe@dragonflybsd.org>	ip: Allow ip_mreqn support for IP_MULTICAST_IF,IP_{ADD,DROP}_MEMBERSHIP - ip_mreqn.imr_ifindex takes precendence over ip_mreqn.imr_address, if it's not 0. This strictly follows what Linux does. - ip: Allow ip_mreqn support for IP_MULTICAST_IF,IP_{ADD,DROP}_MEMBERSHIP - ip_mreqn.imr_ifindex takes precendence over ip_mreqn.imr_address, if it's not 0. This strictly follows what Linux does. - Allow ip_mreq for IP_MULTICAST_IF as what Linux does. While I'm here, remove unnecessary critical section among IP_MULTICAST_IF, IP_{ADD,DROP}_MEMBERSHIP. Bump kernel version. Requested-by: tuxillo show more ...
# 544d23f4	04-Dec-2020	Sepherosa Ziehau <sephe@dragonflybsd.org>	inet: Port IPPROTO/IP_RECVTOS from FreeBSD. Bump kernel version. Requested-by: zrj
Revision tags: v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3, v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1
# 755d70b8	21-Apr-2018	Sascha Wildner <saw@online.de>	Remove IPsec and related code from the system. It was unmaintained ever since we inherited it from FreeBSD 4.8. In fact, we had two implementations from that time: IPSEC and FAST_IPSEC. FAST_IPSEC Remove IPsec and related code from the system. It was unmaintained ever since we inherited it from FreeBSD 4.8. In fact, we had two implementations from that time: IPSEC and FAST_IPSEC. FAST_IPSEC is the implementation to which FreeBSD has moved since, but it didn't even build in DragonFly. Fixes for dports have been committed to DeltaPorts. Requested-by: dillon Dports-testing-and-fixing: zrj show more ...
Revision tags: v5.2.0, v5.3.0, v5.2.0rc, v5.0.2
# 06937ef9	25-Nov-2017	Sascha Wildner <saw@online.de>	Remove faith(4) and faithd(8) from the tree. FreeBSD did that 3 years ago (r274331). Quoting from their commit msg: -----8<----- It looks like industry have chosen different (and more traditional) Remove faith(4) and faithd(8) from the tree. FreeBSD did that 3 years ago (r274331). Quoting from their commit msg: -----8<----- It looks like industry have chosen different (and more traditional) stateless/stateful NAT64 as translation mechanism. Last non-trivial commits to both faith(4) and faithd(8) happened more than 12 years ago, so I assume it is time to drop RFC3142 in FreeBSD. ----->8----- Some more info here: https://lists.freebsd.org/pipermail/freebsd-net/2014-October/040224.html Discussed-with: sephe show more ...
Revision tags: v5.0.1, v5.0.0, v5.0.0rc2
# e622598e	30-Sep-2017	Sepherosa Ziehau <sephe@dragonflybsd.org>	ipfw: Implement state based "redirect", i.e. without using libalias. Redirection creates two states, i.e. one before the translation (xlat0) and one after the translation (xlat1). If the hash of th ipfw: Implement state based "redirect", i.e. without using libalias. Redirection creates two states, i.e. one before the translation (xlat0) and one after the translation (xlat1). If the hash of the translated packet indicates that it is owned by a remote CPU: - If the packet triggers the state pair creation, the 'xlat1' will be piggybacked by the translated packet, which will be forwarded to the remote CPU for further evalution. And the 'xlat1' will be installed on the remote CPU before the evalution of the translated packet. - Else only the translated packet will be forwarded to the remote CPU for further evalution. The 'xlat1' is called the slave state, which will be deleted only when the 'xlat0' (the master state) is deleted. The state pair is always deleted on the CPU owning the 'xlat1'; the 'xlat0' will be forwarded there. The reference counting of the state pair is maintained independently in each state, the memory of the state pair will be freed only after the sum of the counter in each state reaches 0. This avoids expensive per-packet atomic ops. As far as I have tested, this implementation of "redirect" does _not_ introduce any noticeable performance reduction, latency increasing or latency destability. This commit makes most of the necessary bits for NAT ready too. show more ...
Revision tags: v5.1.0, v5.0.0rc1
# 5204e13c	07-Aug-2017	Sepherosa Ziehau <sephe@dragonflybsd.org>	netisr: Simplify assertion related bits
# 918e8ca3	03-Aug-2017	Sepherosa Ziehau <sephe@dragonflybsd.org>	inet: ip_{output/input}() should only run in first netisr_ncpus netisrs
Revision tags: v4.8.1
# 860b6b42	20-Jun-2017	Sepherosa Ziehau <sephe@dragonflybsd.org>	loopback: Use ifclone APIs to create loopback interfaces. This paves way for multiple FIB support.
Revision tags: v4.8.0, v4.6.2, v4.9.0, v4.8.0rc, v4.6.1
# afd2da4d	03-Aug-2016	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Remove PG_ZERO and zeroidle (page-zeroing) entirely * Remove the PG_ZERO flag and remove all page-zeroing optimizations, entirely. Aftering doing a substantial amount of testing, these kernel - Remove PG_ZERO and zeroidle (page-zeroing) entirely * Remove the PG_ZERO flag and remove all page-zeroing optimizations, entirely. Aftering doing a substantial amount of testing, these optimizations, which existed all the way back to CSRG BSD, no longer provide any benefit on a modern system. - Pre-zeroing a page only takes 80ns on a modern cpu. vm_fault overhead in general is ~at least 1 microscond. - Pre-zeroing a page leads to a cold-cache case on-use, forcing the fault source (e.g. a userland program) to actually get the data from main memory in its likely immediate use of the faulted page, reducing performance. - Zeroing the page at fault-time is actually more optimal because it does not require any reading of dynamic ram and leaves the cache hot. - Multiple synth and build tests show that active idle-time zeroing of pages actually reduces performance somewhat and incidental allocations of already-zerod pages (from page-table tear-downs) do not affect performance in any meaningful way. * Remove bcopyi() and obbcopy() -> collapse into bcopy(). These other versions existed because bcopy() used to be specially-optimized and could not be used in all situations. That is no longer true. * Remove bcopy function pointer argument to m_devget(). It is no longer used. This function existed to help support ancient drivers which might have needed a special memory copy to read and write mapped data. It has long been supplanted by BUSDMA. show more ...
Revision tags: v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0, v4.4.3, v4.4.2
# ef6b578f	12-Dec-2015	Sepherosa Ziehau <sephe@dragonflybsd.org>	inet/mcast: Don't free inp_moptions in ip_setmoptions() This memory foot print optimization does not save much memory for us, but it could cause a lot of trouble for in_pcbladdr_find().
Revision tags: v4.4.1, v4.4.0, v4.5.0, v4.4.0rc, v4.2.4, v4.3.1, v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc, v4.0.5
# 9b975f11	16-Mar-2015	Sepherosa Ziehau <sephe@dragonflybsd.org>	ip: Don't generate IP ID for DF IP datagrams (part of RFC6864)
Revision tags: v4.0.4
# b5523eac	19-Feb-2015	Sascha Wildner <saw@online.de>	kernel: Move us to using M_NOWAIT and M_WAITOK for mbuf functions. The main reason is that our having to use the MB_WAIT and MB_DONTWAIT flags was a recurring issue when porting drivers from FreeBSD kernel: Move us to using M_NOWAIT and M_WAITOK for mbuf functions. The main reason is that our having to use the MB_WAIT and MB_DONTWAIT flags was a recurring issue when porting drivers from FreeBSD because it tended to get forgotten and the code would compile anyway with the wrong constants. And since MB_WAIT and MB_DONTWAIT ended up as ocflags for an objcache_get() or objcache_reclaimlist call (which use M_WAITOK and M_NOWAIT), it was just one big converting back and forth with some sanitization in between. This commit allows M_* again for the mbuf functions and keeps the sanitizing as it was before: when M_WAITOK is among the passed flags, objcache functions will be called with M_WAITOK and when it is absent, they will be called with M_NOWAIT. All other flags are scrubbed by the MB_OCFLAG() macro which does the same as the former MBTOM(). Approved-by: dillon show more ...
# b4051e25	22-Jan-2015	Sepherosa Ziehau <sephe@dragonflybsd.org>	ifnet: Make ifnet and ifindex2ifnet MPSAFE - Accessing to these two global variables from non-netisr threads uses ifnet lock. This kind of accessing is from - Accessing to ifindex2ifnet from neti ifnet: Make ifnet and ifindex2ifnet MPSAFE - Accessing to these two global variables from non-netisr threads uses ifnet lock. This kind of accessing is from - Accessing to ifindex2ifnet from netisrs are lockless MPSAFE. - Netisrs no longer access ifnet, instead they access ifnet array as of this commit, which is lockless MPSAFE. Rules for accessing ifnet and ifindex2ifnet is commented near the declaration of the related global variables/functions in net/if_var.h. show more ...
Revision tags: v4.0.3, v4.0.2, v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0
# 8ba7dcb1	25-Sep-2014	Sepherosa Ziehau <sephe@dragonflybsd.org>	socket: Provide socket owner cpuid hint If the cpuid hint could not be provided or the cpuid hint does not make sense, -1 will be returned as cpuid hint, e.g. TCP listen sockets w/o SO_REUSEPORT. T socket: Provide socket owner cpuid hint If the cpuid hint could not be provided or the cpuid hint does not make sense, -1 will be returned as cpuid hint, e.g. TCP listen sockets w/o SO_REUSEPORT. This helps avoiding unnecessary IPIs and contention on receiving sockbuf token. show more ...
Revision tags: v3.8.2
# be4519a2	03-Jul-2014	Sepherosa Ziehau <sephe@dragonflybsd.org>	udp: Make udp pcbinfo and portinfo per-cpu; greatly improve performance MAJOR CHANGES: - Add token to protect pcbinfo's inpcb list and wildcard hash table. Currently only udp per-cpu pcbinfo sets udp: Make udp pcbinfo and portinfo per-cpu; greatly improve performance MAJOR CHANGES: - Add token to protect pcbinfo's inpcb list and wildcard hash table. Currently only udp per-cpu pcbinfo sets this token. udp serializer and netisr barrier are nuked. o udp inpcb list: Under most cases, udp inpcb list is operated in its owner netisr. However, it is also accessed and modified (no effiective udp inpcb will be unlinked though) in netisr0 to adjust multicast options if one interface is to be detached. So protecting udp inpcb list accessing and modification w/ token is necessary. At udp inpcb detach time, the udp inpcb is first removed from the udp inpcb list, then a message will go through all netisrs, which makes sure that no netisrs are using or can find this udp inpcb from the udp inpcb list. After all these, this udp inpcb is destroyed in its owner netisr. In netisrs, it is MP safe to find a udp inpcb from udp inpcb list, then release the token and process the found udp inpcb. In other threads, it is MP safe to find a udp inpcb from udp inpcb list, then release the token and process the found udp inpcb in non-blocking fashion. See also the usage of inpcb marker. o udp wildcard hash table: On input path, udp wildcard hash table is searched in its owner netisr. In order to ease implicit binding (bind during send), connect after binding, and disconnect, udp inpcb are inserted into and removed from other udp pcbinfos' wildcard hash table in its owner netisr. Thus the udp wildcard hash table must be protected w/ token. At udp inpcb detach time, a message will go through all netisrs, and this udp inpcb will be removed from the udp wildcard hash table belonging to the current netisr. This makes sure that once the current netisr runs the message handler, this udp inpcb will not be used and be found in the current netisr. When the message reaches the last netisr, this udp inpcb is redispatched to its owner netisr to be destroyed. In netisrs, it is MP safe to find a udp inpcb from udp wildcard hash table, then release the token and process the found udp inpcb, e.g. use udp inpcb found by in_pcblookuphash(). In other threads, it is MP safe to find a udp inpcb from udp wildcard hash table, then release the token and process the found udp inpcb in non-blocking fashion. See also the usage of inpcb container marker. o udp connect hash table: It is lockless MP safe, and only accessed and modified in its owner netisr. - During inpcb iteration through inpcb list, use inpcb marker when calling functions, which may block, e.g. in_pcbpurgeif0(), so the inpcb iteration will not stop prematurely, if the inpcb being processed is removed from the inpcb list. - Use udp inpcb wildcard table and udp inpcb connect hash table to dispatch input multicast and broadcast udp datagrams. Using udp inpcb list could be time consume, since we need to check udp inpcb lists on all cpus; and secondly, once udp inpcb has a local port, it will be in either udp wildcard hash table or udp connect hash table. Since the socket buffer operation on input path may block, inpcb container marker is used when iterating inpcbs from udp inpcb wildcard hash table. in_pcblookup_pkthash() is adjusted to skip inpcb container marker. - udp socket so_port is no longer fixed to netisr0 msgport o Initial udp socket so_port is the current cpu's netisr msgport. o Bound but unconnected udp socket so_port is selected according to local port hash. o Connected udp socket so_port is selected according to the udp hash, i.e. laddr/faddr toeplitz hash (exception: multicast laddr or multicast faddr, is hashed to netisr0). o Multicast socket options are forced to be handled in netisr0, since udp socket so_port may not be netisr0 msgport. - In order to support asynchronized udp inpcb detach: o EJUSTRETURN from pru_detach method now means protocol will call sodiscard() and sofree() for soclose(). udp pru_detach method returns EJUSTRETURN as of this commit. o SS_ISCLOSING socket state is set before calling pru_detach method, so protocol could avoid certain expensive, unnecessary or disallowed operation in pru_disconnect or pru_detach method, e.g. udp pru_disconnect method avoids putting udp inpcb back to udp wildcard hash table, if SS_ISCLOSING is set. MISC CHANGES: - pcbinfo's cpu id must be set now; -1 is disallowed. - udp pru_abort method should never be called; it panicks now. - Restore traditional BSD behaviour, if unbound udp socket connect fails: if local port of the udp socket has been selected, its inpcb should be in wildcard hash table, i.e. the udp inpcb should be visible on udp datagrams input path. - Make sure multicast stuffs are adjusted only in netisr0 for inet6, if one interface is about to be detached. PERFORMANCE IMPROVEMENT: For 'kq_connect_client -u' test, this commit gives 400% performance improvement (31Kconns/s -> 160Kconns/s). show more ...
Revision tags: v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc
# 3b07180d	19-May-2014	Sepherosa Ziehau <sephe@dragonflybsd.org>	ip_output: if_simloop is MPSAFE
# 72659ed0	13-May-2014	Sepherosa Ziehau <sephe@dragonflybsd.org>	ifnet: Properly protect if_multiaddrs using ifnet serializers - Protect ifnet.if_multiaddrs using ifnet serializers. Add some comment in the places, where only main serailizer is necessary. - Fix ifnet: Properly protect if_multiaddrs using ifnet serializers - Protect ifnet.if_multiaddrs using ifnet serializers. Add some comment in the places, where only main serailizer is necessary. - Fix if_delallmulti(). Using TAILQ_FOREACH_MUTABLE is incorrect for deleting an ifmultiaddr from ifnet.if_multiaddrs. Since deleting one ifmultiaddr may cause additional ifmultiaddr deletion (e.g. the AF_LINK ifmultiaddr for AF_INET ifmultiaddr). - Change IN_LOOKUP_MULTI and IN6_LOOKUP_MULTI macros into inline functions. - Redispatch multicast IP packets to netisr0 for further processing. Software based IP packet hash function is changed. And hash value fixup for multicast IP packets is added to the beginning of ip_input(); this is mainly for IP packets, whose hash is calculated by hardware. - For wlan's multicast hardware filter updating, we no longer need to release wlan serializer and mess up w/ the if_ioctl setting. In netisr0, read and test ifma_refcount for AF_INET ifmultiaddr is MPSAFE w/o ifnet serializers, since its ifma_refcount is only altered in netisr0. In netisr0, any operation on in_multi, which is obtained from the corresponding ifmuliaddr's ifma_protospec, is MPSAFE w/o ifnet serializers, since ifmultiaddr for AF_INET is only set and cleared in netisr0. While I'm here also redispatch IP packets w/o hash to the proper netisrs, on ip_input() path. And unnecessary critical sections in in_{add,del}multi() are removed. show more ...
Revision tags: v3.6.2, v3.6.1, v3.6.0, v3.7.1, v3.6.0rc, v3.7.0, v3.4.3, v3.4.2
# 5b8afb6b	09-May-2013	Sepherosa Ziehau <sephe@dragonflybsd.org>	ip_output: Always panic if the rtentry is not owned by the current cpu It has been on for several releases; make it mandatory now
Revision tags: v3.4.0, v3.4.1
# 2e585ead	09-Apr-2013	Sepherosa Ziehau <sephe@dragonflybsd.org>	ip/udp: Fix IP source address setting for multicast address bound socket It is a common practice to bind UDP socket to multicast address to enjoy kernel level destination multicast address and port ip/udp: Fix IP source address setting for multicast address bound socket It is a common practice to bind UDP socket to multicast address to enjoy kernel level destination multicast address and port filtering. However, if data are sent on this kind of socket, source address of the IP packet will be the bound multicast address?! Two fixes are added to address this bug: 1) Don't set IP source address in udp_output(), if the inpcb's laddr is multicast address. Instead the IP source address is set to INADDR_ANY, so ip_output() could pick up a proper IP source address. 2) With 1) in place, it is possible that IP source address is INADDR_ANY before the ifnet.if_output() using following steps: - If the IP_MULTICAST_IF socket option is set to iface0 - The iface0's last IP address is unset, before the ip_output() This condition could easily be reproduced by using test/mcast: mcast -m 224.2.2.2 -p 3000 -i iface0_ip -D 10 During the 10sec delay, wipe out all IP addresses from iface0 Well, even without 1), raw IP still could generate IP packet using INADDR_ANY as source address. Two checks on the source IP address are added to ip_output() before ifnet.if_output() - IP source address should not be INADDR_ANY - IP source address should not be multicast address And for multicast IP packets, if the IP source address could be determined, they will not be looped back and forwarded. Reported-by: zeroxia show more ...
# 8d7e0714	09-Apr-2013	Sepherosa Ziehau <sephe@dragonflybsd.org>	ip_output: Record the "src was INADDR_ANY" for multicast packets While I'm here, fix the comment on the unicast packet output path.
Revision tags: v3.4.0rc, v3.5.0
# d40991ef	13-Feb-2013	Sepherosa Ziehau <sephe@dragonflybsd.org>	if: Per-cpu ifnet/ifaddr statistics, step 1/3 Wrap ifnet/ifaddr stats updating, setting and extraction into macros; ease upcoming changes.
12 3 4 5