1.\" Copyright (c) 1983, 1991, 1993 2.\" The Regents of the University of California. All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 3. Neither the name of the University nor the names of its contributors 13.\" may be used to endorse or promote products derived from this software 14.\" without specific prior written permission. 15.\" 16.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 19.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 26.\" SUCH DAMAGE. 27.\" 28.\" From: @(#)tcp.4 8.1 (Berkeley) 6/5/93 29.\" $FreeBSD: src/share/man/man4/tcp.4,v 1.11.2.14 2002/12/29 16:35:38 schweikh Exp $ 30.\" 31.Dd April 21, 2018 32.Dt TCP 4 33.Os 34.Sh NAME 35.Nm tcp 36.Nd Internet Transmission Control Protocol 37.Sh SYNOPSIS 38.In sys/types.h 39.In sys/socket.h 40.In netinet/in.h 41.Ft int 42.Fn socket AF_INET SOCK_STREAM 0 43.Sh DESCRIPTION 44The 45.Tn TCP 46protocol provides reliable, flow-controlled, two-way 47transmission of data. It is a byte-stream protocol used to 48support the 49.Dv SOCK_STREAM 50abstraction. TCP uses the standard 51Internet address format and, in addition, provides a per-host 52collection of 53.Dq port addresses . 54Thus, each address is composed 55of an Internet address specifying the host and network, with 56a specific 57.Tn TCP 58port on the host identifying the peer entity. 59.Pp 60Sockets utilizing the tcp protocol are either 61.Dq active 62or 63.Dq passive . 64Active sockets initiate connections to passive 65sockets. By default 66.Tn TCP 67sockets are created active; to create a 68passive socket the 69.Xr listen 2 70system call must be used 71after binding the socket with the 72.Xr bind 2 73system call. Only 74passive sockets may use the 75.Xr accept 2 76call to accept incoming connections. Only active sockets may 77use the 78.Xr connect 2 79call to initiate connections. 80.Pp 81Passive sockets may 82.Dq underspecify 83their location to match 84incoming connection requests from multiple networks. This 85technique, termed 86.Dq wildcard addressing , 87allows a single 88server to provide service to clients on multiple networks. 89To create a socket which listens on all networks, the Internet 90address 91.Dv INADDR_ANY 92must be bound. The 93.Tn TCP 94port may still be specified 95at this time; if the port is not specified the system will assign one. 96Once a connection has been established the socket's address is 97fixed by the peer entity's location. The address assigned the 98socket is the address associated with the network interface 99through which packets are being transmitted and received. Normally 100this address corresponds to the peer entity's network. 101.Pp 102.Tn TCP 103supports a number of socket options which can be set with 104.Xr setsockopt 2 105and tested with 106.Xr getsockopt 2 : 107.Bl -tag -width TCP_NODELAYx 108.It Dv TCP_NODELAY 109Under most circumstances, 110.Tn TCP 111sends data when it is presented; 112when outstanding data has not yet been acknowledged, it gathers 113small amounts of output to be sent in a single packet once 114an acknowledgement is received. 115For a small number of clients, such as window systems 116that send a stream of mouse events which receive no replies, 117this packetization may cause significant delays. 118The boolean option 119.Dv TCP_NODELAY 120defeats this algorithm. 121.It Dv TCP_MAXSEG 122By default, a sender\- and receiver-TCP 123will negotiate among themselves to determine the maximum segment size 124to be used for each connection. The 125.Dv TCP_MAXSEG 126option allows the user to determine the result of this negotiation, 127and to reduce it if desired. 128.It Dv TCP_NOOPT 129.Tn TCP 130usually sends a number of options in each packet, corresponding to 131various 132.Tn TCP 133extensions which are provided in this implementation. The boolean 134option 135.Dv TCP_NOOPT 136is provided to disable 137.Tn TCP 138option use on a per-connection basis. 139.It Dv TCP_NOPUSH 140By convention, the sender-TCP 141will set the 142.Dq push 143bit and begin transmission immediately (if permitted) at the end of 144every user call to 145.Xr write 2 146or 147.Xr writev 2 . 148When the 149.Dv TCP_NOPUSH 150option is set to a non-zero value, 151.Tn TCP 152will delay sending any data at all until either the socket is closed, 153or the internal send buffer is filled. 154.\".It Dv TCP_SIGNATURE_ENABLE 155.\"This option enables the use of MD5 digests (also known as TCP-MD5) 156.\"on writes to the specified socket. 157.\"In the current release, only outgoing traffic is digested; 158.\"digests on incoming traffic are not verified. 159.\"The current default behavior for the system is to respond to a system 160.\"advertising this option with TCP-MD5; this may change. 161.\".Pp 162.\"One common use for this in a DragonFlyBSD router deployment is to enable 163.\"based routers to interwork with Cisco equipment at peering points. 164.\"Support for this feature conforms to RFC 2385. 165.\"Only IPv4 (AF_INET) sessions are supported. 166.\".Pp 167.\"In order for this option to function correctly, it is necessary for the 168.\"administrator to add a tcp-md5 key entry to the system's security 169.\"associations database (SADB) using the 170.\".Xr setkey 8 171.\"utility. 172.\"This entry must have an SPI of 0x1000 and can therefore only be specified 173.\"on a per-host basis at this time. 174.\".Pp 175.\"If an SADB entry cannot be found for the destination, the outgoing traffic 176.\"will have an invalid digest option prepended, and the following error message 177.\"will be visible on the system console: 178.\".Em "tcpsignature_compute: SADB lookup failed for %d.%d.%d.%d" . 179.It Dv TCP_KEEPINIT 180If a 181.Tn TCP 182connection cannot be established within a period of time, 183.Tn TCP 184will time out the connection attempt. 185The 186.Dv TCP_KEEPINIT 187option specifies the number of seconds to wait 188before the connection attempt times out. 189The default value for 190.Dv TCP_KEEPINIT 191is tcp.keepinit seconds. 192For the accepted sockets, the 193.Dv TCP_KEEPINIT 194option value is inherited from the listening socket. 195.It Dv TCP_KEEPIDLE 196When the 197.Dv SO_KEEPALIVE 198option is enabled, 199.Tn TCP 200sends a keepalive probe to the remote system of a connection 201that has been idle for a period of time. 202The 203.Dv TCP_KEEPIDLE 204specifies the number of seconds before 205.Tn TCP 206will send the initial keepalive probe. 207The default value for 208.Dv TCP_KEEPIDLE 209is tcp.keepidle seconds. 210For the accepted sockets, 211the 212.Dv TCP_KEEPIDLE 213option value is inherited from the listening socket. 214.It Dv TCP_KEEPINTVL 215When the 216.Dv SO_KEEPALIVE 217option is enabled, 218.Tn TCP 219sends a keepalive probe to the remote system of a connection 220that has been idle for a period of time. 221The 222.Dv TCP_KEEPINTVL 223option specifies the number of seconds to wait 224before retransmitting a keepalive probe. 225The default value for 226.Dv TCP_KEEPINTVL 227is tcp.keepintvl seconds. 228For the accepted sockets, 229the 230.Dv TCP_KEEPINTVL 231option value is inherited from the listening socket. 232.It Dv TCP_KEEPCNT 233When the 234.Dv SO_KEEPALIVE 235option is enabled, 236.Tn TCP 237sends a keepalive probe to the remote system of a connection 238that has been idle for a period of time. 239The 240.Dv TCP_KEEPCNT 241option specifies the maximum number of keepalive 242probes to be sent before dropping the connection. 243The default value for 244.Dv TCP_KEEPCNT 245is tcp.keepcnt seconds. 246For the accepted sockets, 247the 248.Dv TCP_KEEPCNT 249option value is inherited from the listening socket. 250.El 251.Pp 252The option level for the 253.Xr setsockopt 2 254call is the protocol number for 255.Tn TCP , 256available from 257.Xr getprotobyname 3 , 258or 259.Dv IPPROTO_TCP . 260All options are declared in 261.In netinet/tcp.h . 262.Pp 263Options at the 264.Tn IP 265transport level may be used with 266.Tn TCP ; 267see 268.Xr ip 4 . 269Incoming connection requests that are source-routed are noted, 270and the reverse source route is used in responding. 271.Sh MIB VARIABLES 272The 273.Nm 274protocol implements a number of variables in the 275.Li net.inet 276branch of the 277.Xr sysctl 3 278MIB. 279.Bl -tag -width TCPCTL_DO_RFC1644 280.It Dv TCPCTL_DO_RFC1323 281.Pq tcp.rfc1323 282Implement the window scaling and timestamp options of RFC 1323 283(default true). 284.It Dv TCPCTL_MSSDFLT 285.Pq tcp.mssdflt 286The default value used for the maximum segment size 287.Pq Dq MSS 288when no advice to the contrary is received from MSS negotiation. 289.It Dv TCPCTL_SENDSPACE 290.Pq tcp.sendspace 291Maximum TCP send window. 292.It Dv TCPCTL_RECVSPACE 293.Pq tcp.recvspace 294Maximum TCP receive window. 295.It tcp.log_in_vain 296Log any connection attempts to ports where there is not a socket 297accepting connections. 298The value of 1 limits the logging to SYN (connection establishment) 299packets only. 300That of 2 results in any TCP packets to closed ports being logged. 301Any value unlisted above disables the logging 302(default is 0, i.e., the logging is disabled). 303.It tcp.msl 304The Maximum Segment Lifetime for a packet. 305.It tcp.keepinit 306Timeout for new, non-established TCP connections. 307.It tcp.keepidle 308Amount of time the connection should be idle before keepalive 309probes (if enabled) are sent. 310.It tcp.keepintvl 311The interval between keepalive probes sent to remote machines. 312After 313tcp.keepcnt 314(default 8) probes are sent, with no response, the connection is dropped. 315.It tcp.keepcnt 316The maximum number of keepalive probes to be sent 317before dropping the connection. 318.It tcp.always_keepalive 319Assume that 320.Dv SO_KEEPALIVE 321is set on all 322.Tn TCP 323connections, the kernel will 324periodically send a packet to the remote host to verify the connection 325is still up. 326.It tcp.icmp_may_rst 327Certain 328.Tn ICMP 329unreachable messages may abort connections in 330.Tn SYN-SENT 331state. 332.It tcp.do_tcpdrain 333Flush packets in the 334.Tn TCP 335reassembly queue if the system is low on mbufs. 336.It tcp.blackhole 337If enabled, disable sending of RST when a connection is attempted 338to a port where there is not a socket accepting connections. 339See 340.Xr blackhole 4 . 341.It tcp.delayed_ack 342Delay ACK to try to piggyback it onto a data packet. 343.It tcp.delacktime 344Maximum amount of time before a delayed ACK is sent. 345.It tcp.newreno 346Enable TCP NewReno Fast Recovery algorithm, 347as described in RFC 2582. 348.It tcp.path_mtu_discovery 349Enables Path MTU Discovery. PMTU Discovery is helpful for avoiding 350IP fragmentation when tranferring lots of data to the same client. 351For web servers, where most of the connections are short and to 352different clients, PMTU Discovery actually hurts performance due 353to unnecessary retransmissions. Turn this on only if most of your 354TCP connections are long transfers or are repeatedly to the same 355set of clients. 356.It tcp.tcbhashsize 357Size of the 358.Tn TCP 359control-block hashtable 360(read-only). 361This may be tuned using the kernel option 362.Dv TCBHASHSIZE 363or by setting 364.Va net.inet.tcp.tcbhashsize 365in the 366.Xr loader 8 . 367.It tcp.pcbcount 368Number of active process control blocks 369(read-only). 370.It tcp.syncookies 371Determines whether or not syn cookies should be generated for 372outbound syn-ack packets. Syn cookies are a great help during 373syn flood attacks, and are enabled by default. 374.It tcp.isn_reseed_interval 375The interval (in seconds) specifying how often the secret data used in 376RFC 1948 initial sequence number calculations should be reseeded. 377By default, this variable is set to zero, indicating that 378no reseeding will occur. 379Reseeding should not be necessary, and will break 380.Dv TIME_WAIT 381recycling for a few minutes. 382.It tcp.inet.tcp.rexmit_{min,slop} 383Adjust the retransmit timer calculation for TCP. The slop is 384typically added to the raw calculation to take into account 385occasional variances that the SRTT (smoothed round trip time) 386is unable to accommodate, while the minimum specifies an 387absolute minimum. While a number of TCP RFCs suggest a 1 388second minimum these RFCs tend to focus on streaming behavior 389and fail to deal with the fact that a 1 second minimum has severe 390detrimental effects over lossy interactive connections, such 391as a 802.11b wireless link, and over very fast but lossy 392connections for those cases not covered by the fast retransmit 393code. For this reason we suggest changing the slop to 200ms and 394setting the minimum to something out of the way, like 20ms, 395which gives you an effective minimum of 200ms (similar to Linux). 396.It tcp.inflight_enable 397Enable 398.Tn TCP 399bandwidth delay product limiting. An attempt will be made to calculate 400the bandwidth delay product for each individual TCP connection and limit 401the amount of inflight data being transmitted to avoid building up 402unnecessary packets in the network. This option is recommended if you 403are serving a lot of data over connections with high bandwidth-delay 404products, such as modems, GigE links, and fast long-haul WANs, and/or 405you have configured your machine to accommodate large TCP windows. In such 406situations, without this option, you may experience high interactive 407latencies or packet loss due to the overloading of intermediate routers 408and switches. Note that bandwidth delay product limiting only affects 409the transmit side of a TCP connection. 410.It tcp.inflight_debug 411Enable debugging for the bandwidth delay product algorithm. This may 412default to on (1) so if you enable the algorithm you should probably also 413disable debugging by setting this variable to 0. 414.It tcp.inflight_min 415This puts an lower bound on the bandwidth delay product window, in bytes. 416A value of 1024 is typically used for debugging. 6000-16000 is more typical 417in a production installation. Setting this value too low may result in 418slow ramp-up times for bursty connections. Setting this value too high 419effectively disables the algorithm. 420.It tcp.inflight_max 421This puts an upper bound on the bandwidth delay product window, in bytes. 422This value should not generally be modified but may be used to set a 423global per-connection limit on queued data, potentially allowing you to 424intentionally set a less than optimum limit to smooth data flow over a 425network while still being able to specify huge internal TCP buffers. 426.It tcp.inflight_stab 427This value stabilizes the bwnd (write window) calculation at high speeds 428by increasing the bandwidth calculation in 1/10% increments. The default 429value of 50 represents a +5% increase. In addition, bwnd is further increased 430by a fixed 2*maxseg bytes to stabilize the algorithm at low speeds. 431Changing the stab value is not recommended, but you may come across 432situations where tuning is beneficial. 433However, our recommendation for tuning is to stick with only adjusting 434tcp.inflight_min. 435Reducing tcp.inflight_stab too much can lead to upwards of a 20% 436underutilization of the link and prevent the algorithm from properly adapting 437to changing situations. Increasing tcp.inflight_stab too much can lead to 438an excessive packet buffering situation. 439.El 440.Sh ERRORS 441A socket operation may fail with one of the following errors returned: 442.Bl -tag -width Er 443.It Bq Er EISCONN 444when trying to establish a connection on a socket which 445already has one; 446.It Bq Er ENOBUFS 447when the system runs out of memory for 448an internal data structure; 449.It Bq Er ETIMEDOUT 450when a connection was dropped 451due to excessive retransmissions; 452.It Bq Er ECONNRESET 453when the remote peer 454forces the connection to be closed; 455.It Bq Er ECONNREFUSED 456when the remote 457peer actively refuses connection establishment (usually because 458no process is listening to the port); 459.It Bq Er EADDRINUSE 460when an attempt 461is made to create a socket with a port which has already been 462allocated; 463.It Bq Er EADDRNOTAVAIL 464when an attempt is made to create a 465socket with a network address for which no network interface 466exists. 467.It Bq Er EAFNOSUPPORT 468when an attempt is made to bind or connect a socket to a multicast 469address. 470.El 471.Sh SEE ALSO 472.Xr getsockopt 2 , 473.Xr socket 2 , 474.Xr sysctl 3 , 475.Xr blackhole 4 , 476.Xr inet 4 , 477.Xr intro 4 , 478.Xr ip 4 479.Rs 480.%A V. Jacobson 481.%A R. Braden 482.%A D. Borman 483.%T "TCP Extensions for High Performance" 484.%O RFC 1323 485.Re 486.Rs 487.%A "A. Heffernan" 488.%T "Protection of BGP Sessions via the TCP MD5 Signature Option" 489.%O "RFC 2385" 490.Re 491.Sh HISTORY 492The 493.Nm 494protocol appeared in 495.Bx 4.2 . 496The RFC 1323 extensions for window scaling and timestamps were added 497in 498.Bx 4.4 . 499