#
b272101a |
| 30-Oct-2023 |
Aaron LI <aly@aaronly.me> |
Various minor whitespace cleanups
Accumulated along the way.
|
#
410f8572 |
| 22-Dec-2023 |
Aaron LI <aly@aaronly.me> |
kernel: Replace the deprecated m_copy() with m_copym()
|
#
8a93af2a |
| 08-Jul-2023 |
Matthew Dillon <dillon@apollo.backplane.com> |
network - Remove host-order translations of ipv4 ip_off and ip_len
* Do not translate ip_off and ip_len to host order and then back again in the network stack. The fields are now left in network
network - Remove host-order translations of ipv4 ip_off and ip_len
* Do not translate ip_off and ip_len to host order and then back again in the network stack. The fields are now left in network order.
show more ...
|
Revision tags: v6.4.0, v6.4.0rc1, v6.5.0, v6.2.2, v6.2.1, v6.2.0, v6.3.0, v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3 |
|
#
c443c74f |
| 22-Oct-2019 |
zrj <rimvydas.jasinskas@gmail.com> |
<net/if_var.h>: Remove last explicit dependency on <sys/malloc.h>.
These kernel sources pass M_NOWAIT flag to m_copym() and friends. Mark that it was for M_NOWAIT visibility.
|
#
febebf83 |
| 20-Oct-2019 |
zrj <rimvydas.jasinskas@gmail.com> |
kernel: Minor whitespace cleanup in few sources (part 2).
Separated from next.
|
Revision tags: v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1, v5.2.0, v5.3.0, v5.2.0rc |
|
#
bff82488 |
| 20-Mar-2018 |
Aaron LI <aly@aaronly.me> |
<net/if.h>: Do not include <net/if_var.h> for _KERNEL
* Clean up an ancient leftover: do not include <net/if_var.h> from <net/if.h> for kernel stuffs.
* Adjust various files to include the necess
<net/if.h>: Do not include <net/if_var.h> for _KERNEL
* Clean up an ancient leftover: do not include <net/if_var.h> from <net/if.h> for kernel stuffs.
* Adjust various files to include the necessary <net/if_var.h> header.
NOTE: I have also tested removing the inclusion of <net/if.h> from <net/if_var.h>, therefore add <net/if.h> inclusion for those files that need it but only included <net/if_var.h>. For some files, the header inclusion orderings are also adjusted.
show more ...
|
#
755d70b8 |
| 21-Apr-2018 |
Sascha Wildner <saw@online.de> |
Remove IPsec and related code from the system.
It was unmaintained ever since we inherited it from FreeBSD 4.8.
In fact, we had two implementations from that time: IPSEC and FAST_IPSEC. FAST_IPSEC
Remove IPsec and related code from the system.
It was unmaintained ever since we inherited it from FreeBSD 4.8.
In fact, we had two implementations from that time: IPSEC and FAST_IPSEC. FAST_IPSEC is the implementation to which FreeBSD has moved since, but it didn't even build in DragonFly.
Fixes for dports have been committed to DeltaPorts.
Requested-by: dillon Dports-testing-and-fixing: zrj
show more ...
|
Revision tags: v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc |
|
#
76a9ffca |
| 21-Dec-2016 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
ip: Set mbuf hash for output IP packets.
This paves the way to implement Flow-Queue-Codel.
|
Revision tags: v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0 |
|
#
1bdd592f |
| 30-May-2016 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
tcp: Don't prematurely drop receiving-only connections.
If the connection was persistent and receiving-only, several (12) sporadic device insufficient buffers would cause the connection be dropped p
tcp: Don't prematurely drop receiving-only connections.
If the connection was persistent and receiving-only, several (12) sporadic device insufficient buffers would cause the connection be dropped prematurely: Upon ENOBUFS in tcp_output() for an ACK, retransmission timer is started. No one will stop this retransmission timer for receiving- only connection, so the retransmission timer promises to expire and t_rxtshift is promised to be increased. And t_rxtshift will not be reset to 0, since no RTT measurement will be done for receiving-only connection. If this receiving-only connection lived long enough, and it suffered 12 sporadic device insufficient buffers, i.e. t_rxtshift >= 12, this receiving-only connection would be dropped prematurely by the retransmission timer.
We now assert that for data segments, SYNs or FINs either rexmit or persist timer was wired upon ENOBUFS. And don't set rexmit timer for other cases, i.e. ENOBUFS upon ACKs.
And we no longer penalize send window upon ENOBUFS.
Obtained-from: FreeBSD r300981
show more ...
|
Revision tags: v4.4.3, v4.4.2, v4.4.1, v4.4.0, v4.5.0, v4.4.0rc, v4.2.4, v4.3.1 |
|
#
01a777f0 |
| 17-Jul-2015 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - MFC 160de052b2 from FreeBSD (persist timer)
Avoid a situation where we do not set persist timer after a zero window condition. If you send a 0-length packet, but there is data is the socke
kernel - MFC 160de052b2 from FreeBSD (persist timer)
Avoid a situation where we do not set persist timer after a zero window condition. If you send a 0-length packet, but there is data is the socket buffer, and neither the rexmt or persist timer is already set, then activate the persist timer.
Author: hiren <hiren@FreeBSD.org> Taken-from: FreeBSD
show more ...
|
Revision tags: v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc, v4.0.5, v4.0.4 |
|
#
b5523eac |
| 19-Feb-2015 |
Sascha Wildner <saw@online.de> |
kernel: Move us to using M_NOWAIT and M_WAITOK for mbuf functions.
The main reason is that our having to use the MB_WAIT and MB_DONTWAIT flags was a recurring issue when porting drivers from FreeBSD
kernel: Move us to using M_NOWAIT and M_WAITOK for mbuf functions.
The main reason is that our having to use the MB_WAIT and MB_DONTWAIT flags was a recurring issue when porting drivers from FreeBSD because it tended to get forgotten and the code would compile anyway with the wrong constants. And since MB_WAIT and MB_DONTWAIT ended up as ocflags for an objcache_get() or objcache_reclaimlist call (which use M_WAITOK and M_NOWAIT), it was just one big converting back and forth with some sanitization in between.
This commit allows M_* again for the mbuf functions and keeps the sanitizing as it was before: when M_WAITOK is among the passed flags, objcache functions will be called with M_WAITOK and when it is absent, they will be called with M_NOWAIT. All other flags are scrubbed by the MB_OCFLAG() macro which does the same as the former MBTOM().
Approved-by: dillon
show more ...
|
Revision tags: v4.0.3, v4.0.2 |
|
#
b92efbf5 |
| 25-Dec-2014 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
tcp: Enable path mtu discovery by default
This also eases the adoption of the RFC6864.
|
#
727ccde8 |
| 18-Dec-2014 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
inet/inet6: Remove the v4-mapped address support
This greatly simplies the code (even the IPv4 code) and avoids all kinds of possible port theft.
INPCB: - Nuke IN6P_IPV6_V6ONLY, which is always on
inet/inet6: Remove the v4-mapped address support
This greatly simplies the code (even the IPv4 code) and avoids all kinds of possible port theft.
INPCB: - Nuke IN6P_IPV6_V6ONLY, which is always on after this commit. - Change inp_vflag into inp_af (AF_INET or AF_INET6), since the socket is either IPv6 or IPv4, but never both. Set inpcb.inp_af in in_pcballoc() instead of in every pru_attach methods. Add INP_ISIPV4() and INP_ISIPV6() macros to check inpcb family (socket family and inpcb.inp_af are same). - Nuke the convoluted code in in_pcbbind() and in6_pcbbind() which is used to allow wildcard binding to accepting IPv4 connections on IPv6 wildcard bound sockets. - Nuke the code in in_pcblookup_pkthash() to match IPv4 faddr with IPv6 wildcard bound socket. - Nuke in6_mapped_{peeraddr,sockaddr,savefaddr}(); use in6_{setpeeraddr, setsockaddr,savefaddr}() directly. - Nuke v4-mapped address convertion functions. - Don't allow binding to v4-mapped address in in6_pcbind(). - Don't allow connecting to v4-mapped address in in6_pcbconnect().
TCP: - Nuke the code in tcp_output() which takes care of the IP header TTL setting for v4-mapped IPv6 socket. - Don't allow binding to v4-mapped address (through in6_pcbbind()). - Don't allow connecting to v4-mapped address and nuke the related code (PRUC_NAMALLOC etc.). - Nuke the code (PRUC_FALLBACK etc.) to fallback to IPv4 connection if IPv6 connection fails, which is wrong. - Nuke the code for v4-mapped IPv6 socket in tcp6_soport().
UDP: - Nuke the code for v4-mapped IPv6 socket in udp_input() and udp_append(). - Don't allow binding to v4-mapped address (through in6_pcbbind()). - Don't allow connecting to v4-mapped address. - Don't allow sending datagrams to v4-mapped address and nuke the related code in udp6_output(). - Nuke the code for v4-mapped IPv6 socket in udp6_disconnect()
RIP: - Don't allow sending packets to v4-mapped address. - Don't allow binding to v4-mapped address. - Don't allow connecting to v4-mapped address.
Misc fixup: - Don't force rip pru_attach method to return 0. If in_pcballoc() fails, just return the error code.
show more ...
|
Revision tags: v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0, v3.8.2 |
|
#
b0c17823 |
| 16-Jul-2014 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add feature to allow sendbuf_auto to decrease the buffer size
* sysctl net.inet.tcp.sendbuf_auto (defaults to 1) is now able to decrease the tcp buffer size as well as increase it.
* Inf
kernel - Add feature to allow sendbuf_auto to decrease the buffer size
* sysctl net.inet.tcp.sendbuf_auto (defaults to 1) is now able to decrease the tcp buffer size as well as increase it.
* Inflight bwnd data is used to determine how much to decrease the buffer. Inflight is enabled by default. If you disable it with (net.inet.tcp.inflight_enable=0), sendbuf_auto will not be able to adjust buffer sizes down.
* Set net.inet.tcp.sendbuf_min (default 32768) to set the floor for any downward adjustment.
* Set net.inet.tcp.sendbuf_auto=2 to disable the decrease feature.
show more ...
|
Revision tags: v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc, v3.6.2, v3.6.1, v3.6.0, v3.7.1, v3.6.0rc, v3.7.0 |
|
#
cec73927 |
| 05-Sep-2013 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Change time_second to time_uptime for all expiration calculations
* Vet the entire kernel and change use cases for expiration calculations using time_second to use time_uptime instead.
*
kernel - Change time_second to time_uptime for all expiration calculations
* Vet the entire kernel and change use cases for expiration calculations using time_second to use time_uptime instead.
* Protects these expiration calculations from step changes in the wall time, particularly needed for route table entries.
* Probably requires further variable type adjustments but the use of time_uptime instead if time_second is highly unlikely to ever overrun any demotions to int still present.
show more ...
|
Revision tags: v3.4.3 |
|
#
4cc8caef |
| 08-Jun-2013 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
altq: Implement two level "rough" priority queue for plain sub-queue
The "rough" part comes from two sources: - Hardware queue could be deep, normally 512 or more even for GigE - Round robin on the
altq: Implement two level "rough" priority queue for plain sub-queue
The "rough" part comes from two sources: - Hardware queue could be deep, normally 512 or more even for GigE - Round robin on the transmission queues is used by all of the multiple transmission queue capable hardwares supported by DragonFly as of this commit. These two sources affect the packet priority set by DragonFly.
DragonFly's "rough" prority queue has only two level, i.e. high priority and normal priority, which should be enough. Each queue has its own header. The normal priority queue will be dequeue only when there is no packets in the high priority queue. During enqueue, if the sub-queue is full and the high priority queue length is less than half of the sub- queue length (both packet count and byte count), drop-head will be applied on the normal priority queue.
M_PRIO mbuf flag is added to mark that the mbuf is destined for the high priority queue. Currently TCP uses it to prioritize SYN, SYN|ACK, and pure ACK w/o FIN and RST. This behaviour could be turn off by net.inet.tcp.prio_synack, which is on by default.
The performance improvement!
The test environment: All three boxes are using Intel i7-2600 w/ HT enabled
+-----+ | | +->- emx1 | B | TCP_MAERTS +-----+ | | | | | | +-----+ | A | bnx0 ---+ | | | +-----+ +-----+ | | | +-<- emx1 | C | TCP_STREAM/TCP_RR | | +-----+
A's kernel has this commit compiled. bnx0 has all four transmission queues enabled. For bnx0, the hardware's transmission queue round-robin is on TSO segment boundry.
Some base line measurement: B<--A TCP_MAERTS (raw stats) (128 client): 984 Mbps (tcp_stream -H A -l 15 -i 128 -r) C-->A TCP_STREAM (128 client): 942 Mbps (tcp_stream -H A -l 15 -i 128) C-->A TCP_CC (768 client): 221199 conns/s (tcp_cc -H A -l 15 -i 768)
To effectively measure the TCP_CC, the prefix route's MSL is changed to 10ms: route change 10.1.0.0/24 -msl 10
All stats gather in the following measurement are below the base line measurement (well, they should be).
C-->A TCP_CC improvement, during test B<--A TCP_MAERTS is running: TCP_MAERTS(raw) TCP_CC TSO prio_synack=1 948 Mbps 15988 conns/s TSO prio_synack=0 965 Mbps 8867 conns/s non-TSO prio_synack=1 943 Mbps 18128 conns/s non-TSO prio_synack=0 959 Mbps 11371 conns/s
* 80% TCP_CC performance improvement w/ TSO and 60% w/o TSO!
C-->A TCP_STREAM improvement, during test B<--A TCP_MAERTS is running: TCP_MAERTS(raw) TCP_STREAM TSO prio_synack=1 969 Mbps 920 Mbps TSO prio_synack=0 969 Mbps 865 Mbps non-TSO prio_synack=1 969 Mbps 920 Mbps non-TSO prio_synack=0 969 Mbps 879 Mbps
* 6% TCP_STREAM performance improvement w/ TSO and 4% w/o TSO.
show more ...
|
#
dc71b7ab |
| 31-May-2013 |
Justin C. Sherrill <justin@shiningsilence.com> |
Correct BSD License clause numbering from 1-2-4 to 1-2-3.
Apparently everyone's doing it: http://svnweb.freebsd.org/base?view=revision&revision=251069
Submitted-by: "Eitan Adler" <lists at eitanadl
Correct BSD License clause numbering from 1-2-4 to 1-2-3.
Apparently everyone's doing it: http://svnweb.freebsd.org/base?view=revision&revision=251069
Submitted-by: "Eitan Adler" <lists at eitanadler.com>
show more ...
|
Revision tags: v3.4.2 |
|
#
2702099d |
| 06-May-2013 |
Justin C. Sherrill <justin@shiningsilence.com> |
Remove advertising clause from all that isn't contrib or userland bin.
By: Eitan Adler <lists@eitanadler.com>
|
#
5337421c |
| 02-May-2013 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
netisr: Inline netisr_cpuport() and netisr_curport()
These two functions do nothing more than just return pointer to the element in the array.
Per our header file naming convention, put these two f
netisr: Inline netisr_cpuport() and netisr_curport()
These two functions do nothing more than just return pointer to the element in the array.
Per our header file naming convention, put these two functions in net/netisr2.h
show more ...
|
#
ec7f7fc8 |
| 28-Apr-2013 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
netisr: Function renaming; no functional changes
This cleans up code for keeping input packets' hash instead of masking the hash with ncpus2_mask. netisr_hashport(), which maps packet hash to netis
netisr: Function renaming; no functional changes
This cleans up code for keeping input packets' hash instead of masking the hash with ncpus2_mask. netisr_hashport(), which maps packet hash to netisr port, will be added soon.
show more ...
|
Revision tags: v3.4.0, v3.4.1, v3.4.0rc, v3.5.0 |
|
#
6999cd81 |
| 26-Feb-2013 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Beef up lwkt_dropmsg() API and fix deadlock in so_async_rcvd*()
* Beef up the lwkt_dropmsg() API. The API now conditionally returns success (0) or an error (ENOENT).
* so_pru_rcvd_async
kernel - Beef up lwkt_dropmsg() API and fix deadlock in so_async_rcvd*()
* Beef up the lwkt_dropmsg() API. The API now conditionally returns success (0) or an error (ENOENT).
* so_pru_rcvd_async() improperly calls lwkt_sendmsg() with a spinlock held. This is not legal. Hack up lwkt_sendmsg() a bit to resolve.
show more ...
|
#
d3d26ea5 |
| 23-Jan-2013 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
tcp: Add comment about "fairsend"
|
#
2fb3a851 |
| 17-Jan-2013 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
tcp: Improve sender-sender and sender-receiver fairness on the same netisr
Yield to other senders or receivers on the same netisr if the current TCP stream has sent certain amount of segments (curre
tcp: Improve sender-sender and sender-receiver fairness on the same netisr
Yield to other senders or receivers on the same netisr if the current TCP stream has sent certain amount of segments (currently 4) and is going to burst more segments. sysctl net.inet.tcp.fairsend could be used to tune how many segements are allowed to burst. For TSO capable devices, their TSO aggregate size limit could also affect the number of segments allowed to burst. Set net.inet.tcp.fairsend to 0 will allow single TCP stream to burst as much as it wants (the old TCP sender's behaviour).
"Fairsend" is performed at the places that do not affect segment sending during congestion control: - User requested output path - ACK input path
Measured improvement in the following setup:
+---+ +---+ | |<-----------| B | | | +---+ | A | | | +---+ | |----------->| C | +---+ +---+
A (i7-2600, w/ HT enabled), 82571EB B (e3-1230, w/ HT enabled), 82574L C (e3-1230, w/ HT enabled), 82574L The performance stats are gathered from 'systat -if 1'
When A runs 8 TCP senders to C and 8 TCP receivers from B, sending performance are same ~975Mbps, however, the receiving performance before this commit stumbles between 670Mbps and 850Mbps; w/ "fairsend" receiving performance stays at 981Mbps.
When A runs 16 TCP senders to C and 16 TCP receivers from B, sending performance are same ~975Mbps, however, the receiving performance before this commit goes from 960Mbps to 980Mbps; w/ "fairsend" receiving performance stays at 981Mbps stably.
When there are more senders and receivers running on A, there is no noticable performance difference on either sending or receiving between non-"fairsend" and "fairsend", because senders are no longer being able to do continuous large burst.
"Fairsend" also improves Jain's fairness index between various amount of senders (8 ~ 128) a little bit (sending only tests).
show more ...
|
#
e41e61d5 |
| 16-Jan-2013 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
tcp/tso: Add per-device TSO aggregation size limit
- Prevent possible TSO large burst, when it is inappropriate (plenty of >24 segements bursts were observered, even when 32 parallel sending TCP
tcp/tso: Add per-device TSO aggregation size limit
- Prevent possible TSO large burst, when it is inappropriate (plenty of >24 segements bursts were observered, even when 32 parallel sending TCP streams are running on the same GigE NIC). TSO large burst has following drawbacks on a single TX queue, even on the devices that are multiple TX queues capable: o Delay other senders' packet transmission quite a lot. o Has negative effect on TCP receivers, which sends ACKs. o Cause buffer bloat in software sending queues, whose upper limit is based on "packet count". o Packet scheduler's decision could be less effective. On the other hand, TSO large burst could improve CPU usage. - Improve fairness between multiple TX queues on the devices that are multiple TX queues capable but only fetch data on TSO large packet boundary instead of TCP segment boundary.
Drivers could supply their own TSO aggregation size limit. If driver does not set it, the default value is 6000 (4 segments if MTU is 1500). The default value increases CPU usage a little bit: on i7-2600 w/ HT enabled, single TCP sending stream, CPU usage increases from 14%~17% to 17%~20%.
User could configure TSO aggregation size limit by using ifconfig(8): ifconfig ifaceX tsolen _n_
show more ...
|
#
4f483122 |
| 07-Jan-2013 |
Sascha Wildner <saw@online.de> |
kernel/tcp_{input,output}: Remove some unused variables.
|