1.\" Copyright (c) 1983, 1991, 1993 2.\" The Regents of the University of California. All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 3. All advertising materials mentioning features or use of this software 13.\" must display the following acknowledgement: 14.\" This product includes software developed by the University of 15.\" California, Berkeley and its contributors. 16.\" 4. Neither the name of the University nor the names of its contributors 17.\" may be used to endorse or promote products derived from this software 18.\" without specific prior written permission. 19.\" 20.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 21.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 23.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 24.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 26.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 27.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 28.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 29.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 30.\" SUCH DAMAGE. 31.\" 32.\" From: @(#)tcp.4 8.1 (Berkeley) 6/5/93 33.\" $FreeBSD: src/share/man/man4/tcp.4,v 1.11.2.14 2002/12/29 16:35:38 schweikh Exp $ 34.\" $DragonFly: src/share/man/man4/tcp.4,v 1.9 2008/10/17 11:30:24 swildner Exp $ 35.\" 36.Dd February 14, 1995 37.Dt TCP 4 38.Os 39.Sh NAME 40.Nm tcp 41.Nd Internet Transmission Control Protocol 42.Sh SYNOPSIS 43.In sys/types.h 44.In sys/socket.h 45.In netinet/in.h 46.Ft int 47.Fn socket AF_INET SOCK_STREAM 0 48.Sh DESCRIPTION 49The 50.Tn TCP 51protocol provides reliable, flow-controlled, two-way 52transmission of data. It is a byte-stream protocol used to 53support the 54.Dv SOCK_STREAM 55abstraction. TCP uses the standard 56Internet address format and, in addition, provides a per-host 57collection of 58.Dq port addresses . 59Thus, each address is composed 60of an Internet address specifying the host and network, with 61a specific 62.Tn TCP 63port on the host identifying the peer entity. 64.Pp 65Sockets utilizing the tcp protocol are either 66.Dq active 67or 68.Dq passive . 69Active sockets initiate connections to passive 70sockets. By default 71.Tn TCP 72sockets are created active; to create a 73passive socket the 74.Xr listen 2 75system call must be used 76after binding the socket with the 77.Xr bind 2 78system call. Only 79passive sockets may use the 80.Xr accept 2 81call to accept incoming connections. Only active sockets may 82use the 83.Xr connect 2 84call to initiate connections. 85.Pp 86Passive sockets may 87.Dq underspecify 88their location to match 89incoming connection requests from multiple networks. This 90technique, termed 91.Dq wildcard addressing , 92allows a single 93server to provide service to clients on multiple networks. 94To create a socket which listens on all networks, the Internet 95address 96.Dv INADDR_ANY 97must be bound. The 98.Tn TCP 99port may still be specified 100at this time; if the port is not specified the system will assign one. 101Once a connection has been established the socket's address is 102fixed by the peer entity's location. The address assigned the 103socket is the address associated with the network interface 104through which packets are being transmitted and received. Normally 105this address corresponds to the peer entity's network. 106.Pp 107.Tn TCP 108supports a number of socket options which can be set with 109.Xr setsockopt 2 110and tested with 111.Xr getsockopt 2 : 112.Bl -tag -width TCP_NODELAYx 113.It Dv TCP_NODELAY 114Under most circumstances, 115.Tn TCP 116sends data when it is presented; 117when outstanding data has not yet been acknowledged, it gathers 118small amounts of output to be sent in a single packet once 119an acknowledgement is received. 120For a small number of clients, such as window systems 121that send a stream of mouse events which receive no replies, 122this packetization may cause significant delays. 123The boolean option 124.Dv TCP_NODELAY 125defeats this algorithm. 126.It Dv TCP_MAXSEG 127By default, a sender\- and receiver-TCP 128will negotiate among themselves to determine the maximum segment size 129to be used for each connection. The 130.Dv TCP_MAXSEG 131option allows the user to determine the result of this negotiation, 132and to reduce it if desired. 133.It Dv TCP_NOOPT 134.Tn TCP 135usually sends a number of options in each packet, corresponding to 136various 137.Tn TCP 138extensions which are provided in this implementation. The boolean 139option 140.Dv TCP_NOOPT 141is provided to disable 142.Tn TCP 143option use on a per-connection basis. 144.It Dv TCP_NOPUSH 145By convention, the sender-TCP 146will set the 147.Dq push 148bit and begin transmission immediately (if permitted) at the end of 149every user call to 150.Xr write 2 151or 152.Xr writev 2 . 153When the 154.Dv TCP_NOPUSH 155option is set to a non-zero value, 156.Tn TCP 157will delay sending any data at all until either the socket is closed, 158or the internal send buffer is filled. 159.It Dv TCP_SIGNATURE_ENABLE 160This option enables the use of MD5 digests (also known as TCP-MD5) 161on writes to the specified socket. 162In the current release, only outgoing traffic is digested; 163digests on incoming traffic are not verified. 164The current default behavior for the system is to respond to a system 165advertising this option with TCP-MD5; this may change. 166.Pp 167One common use for this in a DragonFlyBSD router deployment is to enable 168based routers to interwork with Cisco equipment at peering points. 169Support for this feature conforms to RFC 2385. 170Only IPv4 (AF_INET) sessions are supported. 171.Pp 172In order for this option to function correctly, it is necessary for the 173administrator to add a tcp-md5 key entry to the system's security 174associations database (SADB) using the 175.Xr setkey 8 176utility. 177This entry must have an SPI of 0x1000 and can therefore only be specified 178on a per-host basis at this time. 179.Pp 180If an SADB entry cannot be found for the destination, the outgoing traffic 181will have an invalid digest option prepended, and the following error message 182will be visible on the system console: 183.Em "tcpsignature_compute: SADB lookup failed for %d.%d.%d.%d" . 184.El 185.Pp 186The option level for the 187.Xr setsockopt 2 188call is the protocol number for 189.Tn TCP , 190available from 191.Xr getprotobyname 3 , 192or 193.Dv IPPROTO_TCP . 194All options are declared in 195.In netinet/tcp.h . 196.Pp 197Options at the 198.Tn IP 199transport level may be used with 200.Tn TCP ; 201see 202.Xr ip 4 . 203Incoming connection requests that are source-routed are noted, 204and the reverse source route is used in responding. 205.Sh MIB VARIABLES 206The 207.Nm 208protocol implements a number of variables in the 209.Li net.inet 210branch of the 211.Xr sysctl 3 212MIB. 213.Bl -tag -width TCPCTL_DO_RFC1644 214.It Dv TCPCTL_DO_RFC1323 215.Pq tcp.rfc1323 216Implement the window scaling and timestamp options of RFC 1323 217(default true). 218.It Dv TCPCTL_MSSDFLT 219.Pq tcp.mssdflt 220The default value used for the maximum segment size 221.Pq Dq MSS 222when no advice to the contrary is received from MSS negotiation. 223.It Dv TCPCTL_SENDSPACE 224.Pq tcp.sendspace 225Maximum TCP send window. 226.It Dv TCPCTL_RECVSPACE 227.Pq tcp.recvspace 228Maximum TCP receive window. 229.It tcp.log_in_vain 230Log any connection attempts to ports where there is not a socket 231accepting connections. 232The value of 1 limits the logging to SYN (connection establishment) 233packets only. 234That of 2 results in any TCP packets to closed ports being logged. 235Any value unlisted above disables the logging 236(default is 0, i.e., the logging is disabled). 237.It tcp.msl 238The Maximum Segment Lifetime for a packet. 239.It tcp.keepinit 240Timeout for new, non-established TCP connections. 241.It tcp.keepidle 242Amount of time the connection should be idle before keepalive 243probes (if enabled) are sent. 244.It tcp.keepintvl 245The interval between keepalive probes sent to remote machines. 246After 247.Dv TCPTV_KEEPCNT 248(default 8) probes are sent, with no response, the connection is dropped. 249.It tcp.always_keepalive 250Assume that 251.Dv SO_KEEPALIVE 252is set on all 253.Tn TCP 254connections, the kernel will 255periodically send a packet to the remote host to verify the connection 256is still up. 257.It tcp.icmp_may_rst 258Certain 259.Tn ICMP 260unreachable messages may abort connections in 261.Tn SYN-SENT 262state. 263.It tcp.do_tcpdrain 264Flush packets in the 265.Tn TCP 266reassembly queue if the system is low on mbufs. 267.It tcp.blackhole 268If enabled, disable sending of RST when a connection is attempted 269to a port where there is not a socket accepting connections. 270See 271.Xr blackhole 4 . 272.It tcp.delayed_ack 273Delay ACK to try and piggyback it onto a data packet. 274.It tcp.delacktime 275Maximum amount of time before a delayed ACK is sent. 276.It tcp.newreno 277Enable TCP NewReno Fast Recovery algorithm, 278as described in RFC 2582. 279.It tcp.path_mtu_discovery 280Enables Path MTU Discovery. PMTU Discovery is helpful for avoiding 281IP fragmentation when tranferring lots of data to the same client. 282For web servers, where most of the connections are short and to 283different clients, PMTU Discovery actually hurts performance due 284to unnecessary retransmissions. Turn this on only if most of your 285TCP connections are long transfers or are repeatedly to the same 286set of clients. 287.It tcp.tcbhashsize 288Size of the 289.Tn TCP 290control-block hashtable 291(read-only). 292This may be tuned using the kernel option 293.Dv TCBHASHSIZE 294or by setting 295.Va net.inet.tcp.tcbhashsize 296in the 297.Xr loader 8 . 298.It tcp.pcbcount 299Number of active process control blocks 300(read-only). 301.It tcp.syncookies 302Determines whether or not syn cookies should be generated for 303outbound syn-ack packets. Syn cookies are a great help during 304syn flood attacks, and are enabled by default. 305.It tcp.isn_reseed_interval 306The interval (in seconds) specifying how often the secret data used in 307RFC 1948 initial sequence number calculations should be reseeded. 308By default, this variable is set to zero, indicating that 309no reseeding will occur. 310Reseeding should not be necessary, and will break 311.Dv TIME_WAIT 312recycling for a few minutes. 313.It tcp.inet.tcp.rexmit_{min,slop} 314Adjust the retransmit timer calculation for TCP. The slop is 315typically added to the raw calculation to take into account 316occasional variances that the SRTT (smoothed round trip time) 317is unable to accommodate, while the minimum specifies an 318absolute minimum. While a number of TCP RFCs suggest a 1 319second minimum these RFCs tend to focus on streaming behavior 320and fail to deal with the fact that a 1 second minimum has severe 321detrimental effects over lossy interactive connections, such 322as a 802.11b wireless link, and over very fast but lossy 323connections for those cases not covered by the fast retransmit 324code. For this reason we suggest changing the slop to 200ms and 325setting the minimum to something out of the way, like 20ms, 326which gives you an effective minimum of 200ms (similar to Linux). 327.It tcp.inflight_enable 328Enable 329.Tn TCP 330bandwidth delay product limiting. An attempt will be made to calculate 331the bandwidth delay product for each individual TCP connection and limit 332the amount of inflight data being transmitted to avoid building up 333unnecessary packets in the network. This option is recommended if you 334are serving a lot of data over connections with high bandwidth-delay 335products, such as modems, GigE links, and fast long-haul WANs, and/or 336you have configured your machine to accommodate large TCP windows. In such 337situations, without this option, you may experience high interactive 338latencies or packet loss due to the overloading of intermediate routers 339and switches. Note that bandwidth delay product limiting only affects 340the transmit side of a TCP connection. 341.It tcp.inflight_debug 342Enable debugging for the bandwidth delay product algorithm. This may 343default to on (1) so if you enable the algorithm you should probably also 344disable debugging by setting this variable to 0. 345.It tcp.inflight_min 346This puts an lower bound on the bandwidth delay product window, in bytes. 347A value of 1024 is typically used for debugging. 6000-16000 is more typical 348in a production installation. Setting this value too low may result in 349slow ramp-up times for bursty connections. Setting this value too high 350effectively disables the algorithm. 351.It tcp.inflight_max 352This puts an upper bound on the bandwidth delay product window, in bytes. 353This value should not generally be modified but may be used to set a 354global per-connection limit on queued data, potentially allowing you to 355intentionally set a less than optimum limit to smooth data flow over a 356network while still being able to specify huge internal TCP buffers. 357.It tcp.inflight_stab 358The bandwidth delay product algorithm requires a slightly larger window 359than it otherwise calculates for stability. This parameter determines the 360extra window in maximal packets / 10. The default value of 20 represents 3612 maximal packets. Reducing this value is not recommended but you may 362come across a situation with very slow links where the ping time 363reduction of the default inflight code is not sufficient. If this case 364occurs you should first try reducing tcp.inflight_min and, if that does not 365work, reduce both tcp.inflight_min and tcp.inflight_stab, trying values of 36615, 10, or 5 for the latter. Never use a value less than 5. Reducing 367tcp.inflight_stab can lead to upwards of a 20% underutilization of the link 368as well as reducing the algorithm's ability to adapt to changing 369situations and should only be done as a last resort. 370.El 371.Sh ERRORS 372A socket operation may fail with one of the following errors returned: 373.Bl -tag -width Er 374.It Bq Er EISCONN 375when trying to establish a connection on a socket which 376already has one; 377.It Bq Er ENOBUFS 378when the system runs out of memory for 379an internal data structure; 380.It Bq Er ETIMEDOUT 381when a connection was dropped 382due to excessive retransmissions; 383.It Bq Er ECONNRESET 384when the remote peer 385forces the connection to be closed; 386.It Bq Er ECONNREFUSED 387when the remote 388peer actively refuses connection establishment (usually because 389no process is listening to the port); 390.It Bq Er EADDRINUSE 391when an attempt 392is made to create a socket with a port which has already been 393allocated; 394.It Bq Er EADDRNOTAVAIL 395when an attempt is made to create a 396socket with a network address for which no network interface 397exists. 398.It Bq Er EAFNOSUPPORT 399when an attempt is made to bind or connect a socket to a multicast 400address. 401.El 402.Sh SEE ALSO 403.Xr getsockopt 2 , 404.Xr socket 2 , 405.Xr sysctl 3 , 406.Xr blackhole 4 , 407.Xr inet 4 , 408.Xr intro 4 , 409.Xr ip 4 , 410.Xr setkey 8 411.Rs 412.%A V. Jacobson 413.%A R. Braden 414.%A D. Borman 415.%T "TCP Extensions for High Performance" 416.%O RFC 1323 417.Re 418.Rs 419.%A "A. Heffernan" 420.%T "Protection of BGP Sessions via the TCP MD5 Signature Option" 421.%O "RFC 2385" 422.Re 423.Sh HISTORY 424The 425.Nm 426protocol appeared in 427.Bx 4.2 . 428The RFC 1323 extensions for window scaling and timestamps were added 429in 430.Bx 4.4 . 431