Revision tags: v6.2.1, v6.2.0, v6.3.0, v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3, v5.6.2 |
|
#
7eaeff3d |
| 07-Aug-2019 |
Roy Marples <roy@marples.name> |
socket: introduce SO_RERROR to detect receive buffer overflow
kernel receive buffers are initially of a limited size and generally the network protocols that use them don't care if a packet gets los
socket: introduce SO_RERROR to detect receive buffer overflow
kernel receive buffers are initially of a limited size and generally the network protocols that use them don't care if a packet gets lost.
However some users do care about lost messages even if not baked into the protocol - such as consumers of route(4) to track state.
POSIX states that read(2) can return an error of ENOBUFS so return this error code when an overflow is detected. Guard this with socket option SO_RERROR so that existing applications which do not care can carry on not caring by default.
Taken-from: NetBSD Reviewed-by: sephe
show more ...
|
Revision tags: v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3 |
|
#
2d5847e2 |
| 01-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add kern.ipc.soaccept_reuse and set default to 1
* This feature, enabled by default, allows a service listening on a socket to be killed and restarted without causing "bind: Address alr
kernel - Add kern.ipc.soaccept_reuse and set default to 1
* This feature, enabled by default, allows a service listening on a socket to be killed and restarted without causing "bind: Address already in use" errors due to accepted connections still being present.
* The accepted connections may still be present either because they are still in active use (though typically this is not the case when a service is killed... its children also get killed). But also, more importantly, if the sockets are still present due to lingering on a TCP timeout.
In both of these situations we allow bind() to ignore matches against accepted connections. This allows a service to be restared without having to set SO_REUSEADDR (for example named/bind generally does not set SO_REUSEADDR and restarting can be a pain).
show more ...
|
Revision tags: v5.4.2 |
|
#
fcf6efef |
| 02-Mar-2019 |
Sascha Wildner <saw@online.de> |
kernel: Remove numerous #include <sys/thread2.h>.
Most of them were added when we converted spl*() calls to crit_enter()/crit_exit(), almost 14 years ago. We can now remove a good chunk of them agai
kernel: Remove numerous #include <sys/thread2.h>.
Most of them were added when we converted spl*() calls to crit_enter()/crit_exit(), almost 14 years ago. We can now remove a good chunk of them again for where crit_*() are no longer used.
I had to adjust some files that were relying on thread2.h or headers that it includes coming in via other headers that it was removed from.
show more ...
|
Revision tags: v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1, v5.2.0, v5.3.0, v5.2.0rc, v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc, v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0, v4.4.3, v4.4.2 |
|
#
20faa324 |
| 03-Jan-2016 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
kqueue: Move notifymsglist out of kqinfo
It is only used by socket code.
|
Revision tags: v4.4.1, v4.4.0, v4.5.0, v4.4.0rc |
|
#
37e299d5 |
| 22-Aug-2015 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
socket: Allow keeping a reference on the new socket in sonewconn_faddr
It will be used to fix unix socket races.
|
#
e994cde3 |
| 21-Aug-2015 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
socket: Reorder state setting a little bit in sonewconn_faddr()
This prevents several possible races.
|
#
ee39a18e |
| 19-Aug-2015 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
socket: Assert SS_{INCOMP,COMP} before deq/enq so_{comp,incomp}
Suggested-by: dillon@
|
Revision tags: v4.2.4, v4.3.1 |
|
#
3735a885 |
| 30-Jul-2015 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
socket: Close the soreference() race against socket owner netisr sofree()
The race is kinda like this:
Other thread/netisrN netisrM (so->so_pcb owner) :
socket: Close the soreference() race against socket owner netisr sofree()
The race is kinda like this:
Other thread/netisrN netisrM (so->so_pcb owner) : : getpooltoken(head); : so->so_head = NULL; : : sofree(so); (*) soreference(so); : relpooltoken(head); :
(*) sofree(so) frees the socket, since so->so_head is NULL and getpooltoken(head) is not called.
Reported-by: dillon@
show more ...
|
Revision tags: v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc, v4.0.5, v4.0.4 |
|
#
b5523eac |
| 19-Feb-2015 |
Sascha Wildner <saw@online.de> |
kernel: Move us to using M_NOWAIT and M_WAITOK for mbuf functions.
The main reason is that our having to use the MB_WAIT and MB_DONTWAIT flags was a recurring issue when porting drivers from FreeBSD
kernel: Move us to using M_NOWAIT and M_WAITOK for mbuf functions.
The main reason is that our having to use the MB_WAIT and MB_DONTWAIT flags was a recurring issue when porting drivers from FreeBSD because it tended to get forgotten and the code would compile anyway with the wrong constants. And since MB_WAIT and MB_DONTWAIT ended up as ocflags for an objcache_get() or objcache_reclaimlist call (which use M_WAITOK and M_NOWAIT), it was just one big converting back and forth with some sanitization in between.
This commit allows M_* again for the mbuf functions and keeps the sanitizing as it was before: when M_WAITOK is among the passed flags, objcache functions will be called with M_WAITOK and when it is absent, they will be called with M_NOWAIT. All other flags are scrubbed by the MB_OCFLAG() macro which does the same as the former MBTOM().
Approved-by: dillon
show more ...
|
Revision tags: v4.0.3, v4.0.2 |
|
#
b981a49d |
| 25-Dec-2014 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
socket: Add KTR_SOWAKEUP
Define 2 pairs of children nodes for this KTR, which are used to tracking extra IPIs for accept(2).
Note: The tracked sorwakeup() and the wakeup(so_timeo) does not generate
socket: Add KTR_SOWAKEUP
Define 2 pairs of children nodes for this KTR, which are used to tracking extra IPIs for accept(2).
Note: The tracked sorwakeup() and the wakeup(so_timeo) does not generate extra (wakeup) IPIs.
show more ...
|
Revision tags: v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0, v3.8.2 |
|
#
a5e93826 |
| 18-Jul-2014 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Adjust ssb_space_prealloc() use cases
* Add two flags to the signalsockbuf ssb_flags field.
SSB_PREALLOC - Indicates that data preallocation tracking is being used SSB_STOPSUPP - Indi
kernel - Adjust ssb_space_prealloc() use cases
* Add two flags to the signalsockbuf ssb_flags field.
SSB_PREALLOC - Indicates that data preallocation tracking is being used SSB_STOPSUPP - Indicates that SSB_STOP flow control is being used
* unix domain sockets set SSB_STOPSUPP, tcp and sctp sockets set SSB_PREALLOC.
* sendfile() requires that either SSB_PREALLOC or SSB_STOPSUPP be specified.
* Code now conditionalizes the use of ssb_space() vs ssb_space_prealloc() based on the presence of the SSB_PREALLOC flag.
Reported-by: sephe
show more ...
|
#
b210f45e |
| 18-Jul-2014 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - network adjustments (netisr, tcp, and socket buffer changes)
* Change sowakeup() to use an atomic fetch when testing WAIT/WAKEUP for a quick return. It is now coded properly. Previous c
kernel - network adjustments (netisr, tcp, and socket buffer changes)
* Change sowakeup() to use an atomic fetch when testing WAIT/WAKEUP for a quick return. It is now coded properly. Previous coding is not known to have created any bugs.
* Change sowakeup() to use ssb_space_prealloc() instead of ssb_space() when testing against the transmit low-water mark. This is a bug fix which primarily effects very tiny write()'s. The prior code is not known to have created any problems.
* Make the netisr packet counter before doing a rollup programmer and change the default from 512 to 32 for the moment. This may be changed back to 512 (or some number inbetween) after further testing.
The issue here is that interrupt/netisr pipelining can cause ack aggregation to be delayed for too many packets.
* For TCP, when timestamps are not being used, pass the correct delta to tcp_xmit_timer() in our fallback. The function expects N+1. This should improve/fix incorrect rtt calculations when tcp timestamps are not in use.
* Fix an edge case in tcp_xmit_bandwidth_limit() where the 'ticks' global could change values out from under the code. Load the global into a local variable.
* Change the inflight code to use (t_srtt + t_rttvar) instead of (t_srtt + t_rttbest) / 2.
This needs fine-tuning, the buffer is still too big. Expect more commits later.
* Call sowwakeup() when appending a mbuf to a stream. The append can call sbcompress() and make a stream buffer that has hit its mbuf limit writable again.
* Remove the ssb_notify() macro and collapse the sorwakeup() and sowwakeup() macros. They now just call sowakeup() on the appropriate sockbuf. The notify test is now done in sowakeup().
show more ...
|
#
11b81f5d |
| 16-Jul-2014 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Improve TCP socket handling at high speeds
* Add M_SOLOCKED to mbuf->m_flags. This flag prevents sbcompress() from collapsing more data into a mbuf.
* Rewrite sorecvtcp() (NOTE: sorecei
kernel - Improve TCP socket handling at high speeds
* Add M_SOLOCKED to mbuf->m_flags. This flag prevents sbcompress() from collapsing more data into a mbuf.
* Rewrite sorecvtcp() (NOTE: soreceive() could use similar treatment). Use M_SOLOCKED to freeze mbufs in the sockbuf with the rcvtok held, then do the uiomove() loop WITHOUT the rcvtok held, then finalize the disposal of the mbufs with rcvtok held.
This greatly reduces contention on rcvtok against the netisr threads when reading large amounts of data at once and reduces cpu overhead for netisr and user network threads.
* Change the default transmit ssb_lowat from ssb_hiwat / 2 to ssb_hiwat / 4. The (previous) default maximum socket buffer size was 256KB. The default lowat reduced the effective TCP transmit window to ~100KB. This can cause severe buffering issues on GiGE links when multiple TCP streams are being routed to the same cpu.
With this change the default max send buffer is ~180KB or so.
* Change the default kern.ipc.maxsockbuf from 256KB to 512KB. This primarily effects auto-sizing of tcp buffers which in turn effects most TCP connections.
This coupled with the hiwat fix greatly improves transmit throughput.
* Add more debugging info to the tcp inflight code.
show more ...
|
#
fd27efb4 |
| 18-Jun-2014 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
socket: {soabort,so_pru_abort}a -> {soabort,so_pru_abort}_async
No functional change. They are consistent w/ other so and so_pru function names.
|
Revision tags: v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc, v3.6.2, v3.6.1, v3.6.0, v3.7.1, v3.6.0rc, v3.7.0, v3.4.3 |
|
#
dc71b7ab |
| 31-May-2013 |
Justin C. Sherrill <justin@shiningsilence.com> |
Correct BSD License clause numbering from 1-2-4 to 1-2-3.
Apparently everyone's doing it: http://svnweb.freebsd.org/base?view=revision&revision=251069
Submitted-by: "Eitan Adler" <lists at eitanadl
Correct BSD License clause numbering from 1-2-4 to 1-2-3.
Apparently everyone's doing it: http://svnweb.freebsd.org/base?view=revision&revision=251069
Submitted-by: "Eitan Adler" <lists at eitanadler.com>
show more ...
|
Revision tags: v3.4.2 |
|
#
2702099d |
| 06-May-2013 |
Justin C. Sherrill <justin@shiningsilence.com> |
Remove advertising clause from all that isn't contrib or userland bin.
By: Eitan Adler <lists@eitanadler.com>
|
#
5337421c |
| 02-May-2013 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
netisr: Inline netisr_cpuport() and netisr_curport()
These two functions do nothing more than just return pointer to the element in the array.
Per our header file naming convention, put these two f
netisr: Inline netisr_cpuport() and netisr_curport()
These two functions do nothing more than just return pointer to the element in the array.
Per our header file naming convention, put these two functions in net/netisr2.h
show more ...
|
#
ec7f7fc8 |
| 28-Apr-2013 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
netisr: Function renaming; no functional changes
This cleans up code for keeping input packets' hash instead of masking the hash with ncpus2_mask. netisr_hashport(), which maps packet hash to netis
netisr: Function renaming; no functional changes
This cleans up code for keeping input packets' hash instead of masking the hash with ncpus2_mask. netisr_hashport(), which maps packet hash to netisr port, will be added soon.
show more ...
|
Revision tags: v3.4.0, v3.4.1, v3.4.0rc, v3.5.0 |
|
#
901b9bd6 |
| 19-Mar-2013 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
async_rcvd: Don't add/drop socket reference on hot path
Instead, add reference in tcp_attach(), and drop the reference in tcp_close()
|
Revision tags: v3.2.2, v3.2.1, v3.2.0, v3.3.0 |
|
#
857e4745 |
| 14-Sep-2012 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix unix domain socket portfn routing
* sonewconn_faddr() / sonewconn() was improperly overriding the sync_port setting for unix domain sockets, causing unnecessary netmsg traffic to th
kernel - Fix unix domain socket portfn routing
* sonewconn_faddr() / sonewconn() was improperly overriding the sync_port setting for unix domain sockets, causing unnecessary netmsg traffic to the netisr threads.
* This should significantly improve unix domain socket performance.
With-help-from: sephe
show more ...
|
#
3abced87 |
| 11-Sep-2012 |
Nuno Antunes <nuno.antunes@gmail.com> |
netisr: rename cpu_portfn() to netisr_portfn().
No functional change.
Searched and replaced with: find sys/ -type f -exec sed -i "" 's/cpu_portfn/netisr_portfn/g' '{}' \;
|
#
96c6eb29 |
| 03-Sep-2012 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
tcp: Implement asynchronized pru_rcvd
This mainly avoids extra scheduling cost on the reception path due to lwkt_domsg(). lwkt_sendmsg() is now used to carry out TCP pru_rcvd.
Since TCP's pru_rcvd
tcp: Implement asynchronized pru_rcvd
This mainly avoids extra scheduling cost on the reception path due to lwkt_domsg(). lwkt_sendmsg() is now used to carry out TCP pru_rcvd.
Since TCP's pru_rcvd could be batched, one pru_rcvd netmsg is embedded into struct socket to avoid pru_rcvd netmsg allocation for each pru_rcvd, and this netmsg will be used by lwkt_sendmsg(). Whether this embedded pcu_rcvd netmsg should be sent or not is determined by its MSG_DONE bit. Since user thread and netisr thread could be on different CPUs, the embedded pru_rcvd netmsg's MSG_DONE bit is protected by a spinlock.
To cope with the following race that could drop window updates, tcp_usr_rcvd() replies asynchronized rcvd netmsg before tcp_output():
netisr thread user thread
tcp_usr_rcvd() sorcvtcp() { { tcp_output() : : : : sbunlinkmbuf() : if (rcvd & MSG_DONE) (2) : lwkt_sendmsg(rvcd) : : lwkt_replymsg(rcvd) (1) }
At (2) window update is dropped, since rcvd netmsg is not replied yet at (1)
The result: On i7-2600 (4C/8T, 3.4GHz): 32 parallel netperf -H 127.0.0.1 -t TCP_STREAM -P0 -l 30 (4 runs, unit: Mbps)
old 30253.88 30242.58 30162.55 30101.51 new 33962.74 33798.70 33499.92 33482.35
This gives ~12% performance improvement.
show more ...
|
Revision tags: v3.0.3, v3.0.2, v3.0.1, v3.1.0, v3.0.0 |
|
#
88da6203 |
| 29-Nov-2011 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
accept: Save foreign address earlier, if protocol supports it
- Add so_faddr into socket, which records the accepted socket's foreign address. If it is set, kern_accept() will use it directly ins
accept: Save foreign address earlier, if protocol supports it
- Add so_faddr into socket, which records the accepted socket's foreign address. If it is set, kern_accept() will use it directly instead of calling protocol specific method to extract the foreign address. - Add protocol specific method, pru_safefaddr, which will save the foreign address into socket.so_faddr if the necessary information is supplied. This protocol method will only be called in protocol thread. - Pass the foreign address to sonewconn() if possible, so the foreign address could be saved before the accepted socket is put onto the complete list.
Currently only IPv4/TCP implemented pru_savefaddr
This intends to address the following problems: - Calling pru_accept directly from user context is not MPSAFE, we always races the socket.so_pcb check->use against protocol thread clear/free socket.so_pcb, though the race window is too tiny to be hit. To make it mpsafe, we should dispatch pru_accept to protocol thread. If socket.so_faddr is set here, we are race against nothing and nothing expensive like put the current user thread into sleep will happen. However, if the socket is dropped when it still sits on the complete list, the error will not be timely delivered, i.e. accept(2) will not return error, but the later on read(2)/write(2) on the socket will deliver the error. - Calling pru_accept directly races against the inpcb.inp_f{addr,port} setting up in the protocol thread, since inpcb.inp_f{addr,port} is setup _after_ the accepted socket was put onto the complete list.
user thread proto thread : : : accepted socket -> comp : (inpcb.inp_f{addr,port} are 0 here) comp -> socket : pru_accept : : setup inpcb.inp_f{addr,port}
Returning of 0.0.0.0:0 from accept(2) was observed on heavily loaded web servers.
show more ...
|
#
86d7f5d3 |
| 26-Nov-2011 |
John Marino <draco@marino.st> |
Initial import of binutils 2.22 on the new vendor branch
Future versions of binutils will also reside on this branch rather than continuing to create new binutils branches for each new version.
|
#
15d2bc79 |
| 16-Nov-2011 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
socket: Properly inherit AUTOLOWAT and AUTOSIZE from listen socket
The soreserve and pru_attach could set these two flags internally, so the original code will only retain those two flags but not cl
socket: Properly inherit AUTOLOWAT and AUTOSIZE from listen socket
The soreserve and pru_attach could set these two flags internally, so the original code will only retain those two flags but not clear them if the listen socket does not have them. We now explicitly check those two flags and then set or clear them accordingly.
show more ...
|