History log of /dragonfly/sys/netinet/tcp_output.c (Results 1 – 25 of 90)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# b272101a 30-Oct-2023 Aaron LI <aly@aaronly.me>

Various minor whitespace cleanups

Accumulated along the way.


# 410f8572 22-Dec-2023 Aaron LI <aly@aaronly.me>

kernel: Replace the deprecated m_copy() with m_copym()


# 8a93af2a 08-Jul-2023 Matthew Dillon <dillon@apollo.backplane.com>

network - Remove host-order translations of ipv4 ip_off and ip_len

* Do not translate ip_off and ip_len to host order and then back again
in the network stack. The fields are now left in network

network - Remove host-order translations of ipv4 ip_off and ip_len

* Do not translate ip_off and ip_len to host order and then back again
in the network stack. The fields are now left in network order.

show more ...


Revision tags: v6.4.0, v6.4.0rc1, v6.5.0, v6.2.2, v6.2.1, v6.2.0, v6.3.0, v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3
# c443c74f 22-Oct-2019 zrj <rimvydas.jasinskas@gmail.com>

<net/if_var.h>: Remove last explicit dependency on <sys/malloc.h>.

These kernel sources pass M_NOWAIT flag to m_copym() and friends.
Mark that it was for M_NOWAIT visibility.


# febebf83 20-Oct-2019 zrj <rimvydas.jasinskas@gmail.com>

kernel: Minor whitespace cleanup in few sources (part 2).

Separated from next.


Revision tags: v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1, v5.2.0, v5.3.0, v5.2.0rc
# bff82488 20-Mar-2018 Aaron LI <aly@aaronly.me>

<net/if.h>: Do not include <net/if_var.h> for _KERNEL

* Clean up an ancient leftover: do not include <net/if_var.h> from <net/if.h>
for kernel stuffs.

* Adjust various files to include the necess

<net/if.h>: Do not include <net/if_var.h> for _KERNEL

* Clean up an ancient leftover: do not include <net/if_var.h> from <net/if.h>
for kernel stuffs.

* Adjust various files to include the necessary <net/if_var.h> header.

NOTE:
I have also tested removing the inclusion of <net/if.h> from <net/if_var.h>,
therefore add <net/if.h> inclusion for those files that need it but only
included <net/if_var.h>. For some files, the header inclusion orderings are
also adjusted.

show more ...


# 755d70b8 21-Apr-2018 Sascha Wildner <saw@online.de>

Remove IPsec and related code from the system.

It was unmaintained ever since we inherited it from FreeBSD 4.8.

In fact, we had two implementations from that time: IPSEC and FAST_IPSEC.
FAST_IPSEC

Remove IPsec and related code from the system.

It was unmaintained ever since we inherited it from FreeBSD 4.8.

In fact, we had two implementations from that time: IPSEC and FAST_IPSEC.
FAST_IPSEC is the implementation to which FreeBSD has moved since, but
it didn't even build in DragonFly.

Fixes for dports have been committed to DeltaPorts.

Requested-by: dillon
Dports-testing-and-fixing: zrj

show more ...


Revision tags: v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc
# 76a9ffca 21-Dec-2016 Sepherosa Ziehau <sephe@dragonflybsd.org>

ip: Set mbuf hash for output IP packets.

This paves the way to implement Flow-Queue-Codel.


Revision tags: v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0
# 1bdd592f 30-May-2016 Sepherosa Ziehau <sephe@dragonflybsd.org>

tcp: Don't prematurely drop receiving-only connections.

If the connection was persistent and receiving-only, several (12)
sporadic device insufficient buffers would cause the connection be
dropped p

tcp: Don't prematurely drop receiving-only connections.

If the connection was persistent and receiving-only, several (12)
sporadic device insufficient buffers would cause the connection be
dropped prematurely:
Upon ENOBUFS in tcp_output() for an ACK, retransmission timer is
started. No one will stop this retransmission timer for receiving-
only connection, so the retransmission timer promises to expire and
t_rxtshift is promised to be increased. And t_rxtshift will not be
reset to 0, since no RTT measurement will be done for receiving-only
connection. If this receiving-only connection lived long enough,
and it suffered 12 sporadic device insufficient buffers, i.e.
t_rxtshift >= 12, this receiving-only connection would be dropped
prematurely by the retransmission timer.

We now assert that for data segments, SYNs or FINs either rexmit or
persist timer was wired upon ENOBUFS. And don't set rexmit timer
for other cases, i.e. ENOBUFS upon ACKs.

And we no longer penalize send window upon ENOBUFS.

Obtained-from: FreeBSD r300981

show more ...


Revision tags: v4.4.3, v4.4.2, v4.4.1, v4.4.0, v4.5.0, v4.4.0rc, v4.2.4, v4.3.1
# 01a777f0 17-Jul-2015 Matthew Dillon <dillon@apollo.backplane.com>

kernel - MFC 160de052b2 from FreeBSD (persist timer)

Avoid a situation where we do not set persist timer after a zero window
condition. If you send a 0-length packet, but there is data is the socke

kernel - MFC 160de052b2 from FreeBSD (persist timer)

Avoid a situation where we do not set persist timer after a zero window
condition. If you send a 0-length packet, but there is data is the socket
buffer, and neither the rexmt or persist timer is already set, then
activate the persist timer.

Author: hiren <hiren@FreeBSD.org>
Taken-from: FreeBSD

show more ...


Revision tags: v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc, v4.0.5, v4.0.4
# b5523eac 19-Feb-2015 Sascha Wildner <saw@online.de>

kernel: Move us to using M_NOWAIT and M_WAITOK for mbuf functions.

The main reason is that our having to use the MB_WAIT and MB_DONTWAIT
flags was a recurring issue when porting drivers from FreeBSD

kernel: Move us to using M_NOWAIT and M_WAITOK for mbuf functions.

The main reason is that our having to use the MB_WAIT and MB_DONTWAIT
flags was a recurring issue when porting drivers from FreeBSD because
it tended to get forgotten and the code would compile anyway with the
wrong constants. And since MB_WAIT and MB_DONTWAIT ended up as ocflags
for an objcache_get() or objcache_reclaimlist call (which use M_WAITOK
and M_NOWAIT), it was just one big converting back and forth with some
sanitization in between.

This commit allows M_* again for the mbuf functions and keeps the
sanitizing as it was before: when M_WAITOK is among the passed flags,
objcache functions will be called with M_WAITOK and when it is absent,
they will be called with M_NOWAIT. All other flags are scrubbed by the
MB_OCFLAG() macro which does the same as the former MBTOM().

Approved-by: dillon

show more ...


Revision tags: v4.0.3, v4.0.2
# b92efbf5 25-Dec-2014 Sepherosa Ziehau <sephe@dragonflybsd.org>

tcp: Enable path mtu discovery by default

This also eases the adoption of the RFC6864.


# 727ccde8 18-Dec-2014 Sepherosa Ziehau <sephe@dragonflybsd.org>

inet/inet6: Remove the v4-mapped address support

This greatly simplies the code (even the IPv4 code) and avoids all kinds
of possible port theft.

INPCB:
- Nuke IN6P_IPV6_V6ONLY, which is always on

inet/inet6: Remove the v4-mapped address support

This greatly simplies the code (even the IPv4 code) and avoids all kinds
of possible port theft.

INPCB:
- Nuke IN6P_IPV6_V6ONLY, which is always on after this commit.
- Change inp_vflag into inp_af (AF_INET or AF_INET6), since the socket
is either IPv6 or IPv4, but never both. Set inpcb.inp_af in
in_pcballoc() instead of in every pru_attach methods. Add INP_ISIPV4()
and INP_ISIPV6() macros to check inpcb family (socket family and
inpcb.inp_af are same).
- Nuke the convoluted code in in_pcbbind() and in6_pcbbind() which is used
to allow wildcard binding to accepting IPv4 connections on IPv6 wildcard
bound sockets.
- Nuke the code in in_pcblookup_pkthash() to match IPv4 faddr with IPv6
wildcard bound socket.
- Nuke in6_mapped_{peeraddr,sockaddr,savefaddr}(); use in6_{setpeeraddr,
setsockaddr,savefaddr}() directly.
- Nuke v4-mapped address convertion functions.
- Don't allow binding to v4-mapped address in in6_pcbind().
- Don't allow connecting to v4-mapped address in in6_pcbconnect().

TCP:
- Nuke the code in tcp_output() which takes care of the IP header TTL
setting for v4-mapped IPv6 socket.
- Don't allow binding to v4-mapped address (through in6_pcbbind()).
- Don't allow connecting to v4-mapped address and nuke the related code
(PRUC_NAMALLOC etc.).
- Nuke the code (PRUC_FALLBACK etc.) to fallback to IPv4 connection if
IPv6 connection fails, which is wrong.
- Nuke the code for v4-mapped IPv6 socket in tcp6_soport().

UDP:
- Nuke the code for v4-mapped IPv6 socket in udp_input() and udp_append().
- Don't allow binding to v4-mapped address (through in6_pcbbind()).
- Don't allow connecting to v4-mapped address.
- Don't allow sending datagrams to v4-mapped address and nuke the related
code in udp6_output().
- Nuke the code for v4-mapped IPv6 socket in udp6_disconnect()

RIP:
- Don't allow sending packets to v4-mapped address.
- Don't allow binding to v4-mapped address.
- Don't allow connecting to v4-mapped address.

Misc fixup:
- Don't force rip pru_attach method to return 0. If in_pcballoc() fails,
just return the error code.

show more ...


Revision tags: v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0, v3.8.2
# b0c17823 16-Jul-2014 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add feature to allow sendbuf_auto to decrease the buffer size

* sysctl net.inet.tcp.sendbuf_auto (defaults to 1) is now able to
decrease the tcp buffer size as well as increase it.

* Inf

kernel - Add feature to allow sendbuf_auto to decrease the buffer size

* sysctl net.inet.tcp.sendbuf_auto (defaults to 1) is now able to
decrease the tcp buffer size as well as increase it.

* Inflight bwnd data is used to determine how much to decrease the
buffer. Inflight is enabled by default. If you disable it
with (net.inet.tcp.inflight_enable=0), sendbuf_auto will not
be able to adjust buffer sizes down.

* Set net.inet.tcp.sendbuf_min (default 32768) to set the floor for
any downward adjustment.

* Set net.inet.tcp.sendbuf_auto=2 to disable the decrease feature.

show more ...


Revision tags: v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc, v3.6.2, v3.6.1, v3.6.0, v3.7.1, v3.6.0rc, v3.7.0
# cec73927 05-Sep-2013 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Change time_second to time_uptime for all expiration calculations

* Vet the entire kernel and change use cases for expiration calculations
using time_second to use time_uptime instead.

*

kernel - Change time_second to time_uptime for all expiration calculations

* Vet the entire kernel and change use cases for expiration calculations
using time_second to use time_uptime instead.

* Protects these expiration calculations from step changes in the wall time,
particularly needed for route table entries.

* Probably requires further variable type adjustments but the use of
time_uptime instead if time_second is highly unlikely to ever overrun
any demotions to int still present.

show more ...


Revision tags: v3.4.3
# 4cc8caef 08-Jun-2013 Sepherosa Ziehau <sephe@dragonflybsd.org>

altq: Implement two level "rough" priority queue for plain sub-queue

The "rough" part comes from two sources:
- Hardware queue could be deep, normally 512 or more even for GigE
- Round robin on the

altq: Implement two level "rough" priority queue for plain sub-queue

The "rough" part comes from two sources:
- Hardware queue could be deep, normally 512 or more even for GigE
- Round robin on the transmission queues is used by all of the multiple
transmission queue capable hardwares supported by DragonFly as of this
commit.
These two sources affect the packet priority set by DragonFly.

DragonFly's "rough" prority queue has only two level, i.e. high priority
and normal priority, which should be enough. Each queue has its own
header. The normal priority queue will be dequeue only when there is no
packets in the high priority queue. During enqueue, if the sub-queue is
full and the high priority queue length is less than half of the sub-
queue length (both packet count and byte count), drop-head will be
applied on the normal priority queue.

M_PRIO mbuf flag is added to mark that the mbuf is destined for the high
priority queue. Currently TCP uses it to prioritize SYN, SYN|ACK, and
pure ACK w/o FIN and RST. This behaviour could be turn off by
net.inet.tcp.prio_synack, which is on by default.

The performance improvement!

The test environment:
All three boxes are using Intel i7-2600 w/ HT enabled

+-----+
| |
+->- emx1 | B | TCP_MAERTS
+-----+ | | |
| | | +-----+
| A | bnx0 ---+
| | | +-----+
+-----+ | | |
+-<- emx1 | C | TCP_STREAM/TCP_RR
| |
+-----+

A's kernel has this commit compiled. bnx0 has all four transmission
queues enabled. For bnx0, the hardware's transmission queue round-robin
is on TSO segment boundry.

Some base line measurement:
B<--A TCP_MAERTS (raw stats) (128 client): 984 Mbps
(tcp_stream -H A -l 15 -i 128 -r)
C-->A TCP_STREAM (128 client): 942 Mbps (tcp_stream -H A -l 15 -i 128)
C-->A TCP_CC (768 client): 221199 conns/s (tcp_cc -H A -l 15 -i 768)

To effectively measure the TCP_CC, the prefix route's MSL is changed to
10ms: route change 10.1.0.0/24 -msl 10

All stats gather in the following measurement are below the base line
measurement (well, they should be).

C-->A TCP_CC improvement, during test B<--A TCP_MAERTS is running:
TCP_MAERTS(raw) TCP_CC
TSO prio_synack=1 948 Mbps 15988 conns/s
TSO prio_synack=0 965 Mbps 8867 conns/s
non-TSO prio_synack=1 943 Mbps 18128 conns/s
non-TSO prio_synack=0 959 Mbps 11371 conns/s

* 80% TCP_CC performance improvement w/ TSO and 60% w/o TSO!

C-->A TCP_STREAM improvement, during test B<--A TCP_MAERTS is running:
TCP_MAERTS(raw) TCP_STREAM
TSO prio_synack=1 969 Mbps 920 Mbps
TSO prio_synack=0 969 Mbps 865 Mbps
non-TSO prio_synack=1 969 Mbps 920 Mbps
non-TSO prio_synack=0 969 Mbps 879 Mbps

* 6% TCP_STREAM performance improvement w/ TSO and 4% w/o TSO.

show more ...


# dc71b7ab 31-May-2013 Justin C. Sherrill <justin@shiningsilence.com>

Correct BSD License clause numbering from 1-2-4 to 1-2-3.

Apparently everyone's doing it:
http://svnweb.freebsd.org/base?view=revision&revision=251069

Submitted-by: "Eitan Adler" <lists at eitanadl

Correct BSD License clause numbering from 1-2-4 to 1-2-3.

Apparently everyone's doing it:
http://svnweb.freebsd.org/base?view=revision&revision=251069

Submitted-by: "Eitan Adler" <lists at eitanadler.com>

show more ...


Revision tags: v3.4.2
# 2702099d 06-May-2013 Justin C. Sherrill <justin@shiningsilence.com>

Remove advertising clause from all that isn't contrib or userland bin.

By: Eitan Adler <lists@eitanadler.com>


# 5337421c 02-May-2013 Sepherosa Ziehau <sephe@dragonflybsd.org>

netisr: Inline netisr_cpuport() and netisr_curport()

These two functions do nothing more than just return pointer to the
element in the array.

Per our header file naming convention, put these two f

netisr: Inline netisr_cpuport() and netisr_curport()

These two functions do nothing more than just return pointer to the
element in the array.

Per our header file naming convention, put these two functions in
net/netisr2.h

show more ...


# ec7f7fc8 28-Apr-2013 Sepherosa Ziehau <sephe@dragonflybsd.org>

netisr: Function renaming; no functional changes

This cleans up code for keeping input packets' hash instead of masking
the hash with ncpus2_mask. netisr_hashport(), which maps packet hash
to netis

netisr: Function renaming; no functional changes

This cleans up code for keeping input packets' hash instead of masking
the hash with ncpus2_mask. netisr_hashport(), which maps packet hash
to netisr port, will be added soon.

show more ...


Revision tags: v3.4.0, v3.4.1, v3.4.0rc, v3.5.0
# 6999cd81 26-Feb-2013 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Beef up lwkt_dropmsg() API and fix deadlock in so_async_rcvd*()

* Beef up the lwkt_dropmsg() API. The API now conditionally returns
success (0) or an error (ENOENT).

* so_pru_rcvd_async

kernel - Beef up lwkt_dropmsg() API and fix deadlock in so_async_rcvd*()

* Beef up the lwkt_dropmsg() API. The API now conditionally returns
success (0) or an error (ENOENT).

* so_pru_rcvd_async() improperly calls lwkt_sendmsg() with a spinlock
held. This is not legal. Hack up lwkt_sendmsg() a bit to resolve.

show more ...


# d3d26ea5 23-Jan-2013 Sepherosa Ziehau <sephe@dragonflybsd.org>

tcp: Add comment about "fairsend"


# 2fb3a851 17-Jan-2013 Sepherosa Ziehau <sephe@dragonflybsd.org>

tcp: Improve sender-sender and sender-receiver fairness on the same netisr

Yield to other senders or receivers on the same netisr if the current TCP
stream has sent certain amount of segments (curre

tcp: Improve sender-sender and sender-receiver fairness on the same netisr

Yield to other senders or receivers on the same netisr if the current TCP
stream has sent certain amount of segments (currently 4) and is going to
burst more segments. sysctl net.inet.tcp.fairsend could be used to tune
how many segements are allowed to burst. For TSO capable devices, their
TSO aggregate size limit could also affect the number of segments allowed
to burst. Set net.inet.tcp.fairsend to 0 will allow single TCP stream to
burst as much as it wants (the old TCP sender's behaviour).

"Fairsend" is performed at the places that do not affect segment sending
during congestion control:
- User requested output path
- ACK input path

Measured improvement in the following setup:

+---+ +---+
| |<-----------| B |
| | +---+
| A |
| | +---+
| |----------->| C |
+---+ +---+

A (i7-2600, w/ HT enabled), 82571EB
B (e3-1230, w/ HT enabled), 82574L
C (e3-1230, w/ HT enabled), 82574L
The performance stats are gathered from 'systat -if 1'

When A runs 8 TCP senders to C and 8 TCP receivers from B, sending
performance are same ~975Mbps, however, the receiving performance before
this commit stumbles between 670Mbps and 850Mbps; w/ "fairsend" receiving
performance stays at 981Mbps.

When A runs 16 TCP senders to C and 16 TCP receivers from B, sending
performance are same ~975Mbps, however, the receiving performance before
this commit goes from 960Mbps to 980Mbps; w/ "fairsend" receiving
performance stays at 981Mbps stably.

When there are more senders and receivers running on A, there is no
noticable performance difference on either sending or receiving between
non-"fairsend" and "fairsend", because senders are no longer being able
to do continuous large burst.

"Fairsend" also improves Jain's fairness index between various amount of
senders (8 ~ 128) a little bit (sending only tests).

show more ...


# e41e61d5 16-Jan-2013 Sepherosa Ziehau <sephe@dragonflybsd.org>

tcp/tso: Add per-device TSO aggregation size limit

- Prevent possible TSO large burst, when it is inappropriate (plenty of
>24 segements bursts were observered, even when 32 parallel sending TCP

tcp/tso: Add per-device TSO aggregation size limit

- Prevent possible TSO large burst, when it is inappropriate (plenty of
>24 segements bursts were observered, even when 32 parallel sending TCP
streams are running on the same GigE NIC).
TSO large burst has following drawbacks on a single TX queue, even on
the devices that are multiple TX queues capable:
o Delay other senders' packet transmission quite a lot.
o Has negative effect on TCP receivers, which sends ACKs.
o Cause buffer bloat in software sending queues, whose upper limit is
based on "packet count".
o Packet scheduler's decision could be less effective.
On the other hand, TSO large burst could improve CPU usage.
- Improve fairness between multiple TX queues on the devices that are
multiple TX queues capable but only fetch data on TSO large packet
boundary instead of TCP segment boundary.

Drivers could supply their own TSO aggregation size limit. If driver
does not set it, the default value is 6000 (4 segments if MTU is 1500).
The default value increases CPU usage a little bit: on i7-2600 w/ HT
enabled, single TCP sending stream, CPU usage increases from 14%~17%
to 17%~20%.

User could configure TSO aggregation size limit by using ifconfig(8):
ifconfig ifaceX tsolen _n_

show more ...


# 4f483122 07-Jan-2013 Sascha Wildner <saw@online.de>

kernel/tcp_{input,output}: Remove some unused variables.


1234