uipc_socket2.c - OpenGrok history log for /dragonfly/sys/kern/uipc

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: v6.2.1, v6.2.0, v6.3.0, v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3, v5.6.2
# 7eaeff3d	07-Aug-2019	Roy Marples <roy@marples.name>	socket: introduce SO_RERROR to detect receive buffer overflow kernel receive buffers are initially of a limited size and generally the network protocols that use them don't care if a packet gets los socket: introduce SO_RERROR to detect receive buffer overflow kernel receive buffers are initially of a limited size and generally the network protocols that use them don't care if a packet gets lost. However some users do care about lost messages even if not baked into the protocol - such as consumers of route(4) to track state. POSIX states that read(2) can return an error of ENOBUFS so return this error code when an overflow is detected. Guard this with socket option SO_RERROR so that existing applications which do not care can carry on not caring by default. Taken-from: NetBSD Reviewed-by: sephe show more ...
Revision tags: v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3
# 2d5847e2	01-May-2019	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Add kern.ipc.soaccept_reuse and set default to 1 * This feature, enabled by default, allows a service listening on a socket to be killed and restarted without causing "bind: Address alr kernel - Add kern.ipc.soaccept_reuse and set default to 1 * This feature, enabled by default, allows a service listening on a socket to be killed and restarted without causing "bind: Address already in use" errors due to accepted connections still being present. * The accepted connections may still be present either because they are still in active use (though typically this is not the case when a service is killed... its children also get killed). But also, more importantly, if the sockets are still present due to lingering on a TCP timeout. In both of these situations we allow bind() to ignore matches against accepted connections. This allows a service to be restared without having to set SO_REUSEADDR (for example named/bind generally does not set SO_REUSEADDR and restarting can be a pain). show more ...
Revision tags: v5.4.2
# fcf6efef	02-Mar-2019	Sascha Wildner <saw@online.de>	kernel: Remove numerous #include <sys/thread2.h>. Most of them were added when we converted spl() calls to crit_enter()/crit_exit(), almost 14 years ago. We can now remove a good chunk of them agai kernel: Remove numerous #include <sys/thread2.h>. Most of them were added when we converted spl() calls to crit_enter()/crit_exit(), almost 14 years ago. We can now remove a good chunk of them again for where crit_*() are no longer used. I had to adjust some files that were relying on thread2.h or headers that it includes coming in via other headers that it was removed from. show more ...
Revision tags: v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1, v5.2.0, v5.3.0, v5.2.0rc, v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc, v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0, v4.4.3, v4.4.2
# 20faa324	03-Jan-2016	Sepherosa Ziehau <sephe@dragonflybsd.org>	kqueue: Move notifymsglist out of kqinfo It is only used by socket code.
Revision tags: v4.4.1, v4.4.0, v4.5.0, v4.4.0rc
# 37e299d5	22-Aug-2015	Sepherosa Ziehau <sephe@dragonflybsd.org>	socket: Allow keeping a reference on the new socket in sonewconn_faddr It will be used to fix unix socket races.
# e994cde3	21-Aug-2015	Sepherosa Ziehau <sephe@dragonflybsd.org>	socket: Reorder state setting a little bit in sonewconn_faddr() This prevents several possible races.
# ee39a18e	19-Aug-2015	Sepherosa Ziehau <sephe@dragonflybsd.org>	socket: Assert SS_{INCOMP,COMP} before deq/enq so_{comp,incomp} Suggested-by: dillon@
Revision tags: v4.2.4, v4.3.1
# 3735a885	30-Jul-2015	Sepherosa Ziehau <sephe@dragonflybsd.org>	socket: Close the soreference() race against socket owner netisr sofree() The race is kinda like this: Other thread/netisrN netisrM (so->so_pcb owner) : socket: Close the soreference() race against socket owner netisr sofree() The race is kinda like this: Other thread/netisrN netisrM (so->so_pcb owner) : : getpooltoken(head); : so->so_head = NULL; : : sofree(so); () soreference(so); : relpooltoken(head); : () sofree(so) frees the socket, since so->so_head is NULL and getpooltoken(head) is not called. Reported-by: dillon@ show more ...
Revision tags: v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc, v4.0.5, v4.0.4
# b5523eac	19-Feb-2015	Sascha Wildner <saw@online.de>	kernel: Move us to using M_NOWAIT and M_WAITOK for mbuf functions. The main reason is that our having to use the MB_WAIT and MB_DONTWAIT flags was a recurring issue when porting drivers from FreeBSD kernel: Move us to using M_NOWAIT and M_WAITOK for mbuf functions. The main reason is that our having to use the MB_WAIT and MB_DONTWAIT flags was a recurring issue when porting drivers from FreeBSD because it tended to get forgotten and the code would compile anyway with the wrong constants. And since MB_WAIT and MB_DONTWAIT ended up as ocflags for an objcache_get() or objcache_reclaimlist call (which use M_WAITOK and M_NOWAIT), it was just one big converting back and forth with some sanitization in between. This commit allows M_* again for the mbuf functions and keeps the sanitizing as it was before: when M_WAITOK is among the passed flags, objcache functions will be called with M_WAITOK and when it is absent, they will be called with M_NOWAIT. All other flags are scrubbed by the MB_OCFLAG() macro which does the same as the former MBTOM(). Approved-by: dillon show more ...
Revision tags: v4.0.3, v4.0.2
# b981a49d	25-Dec-2014	Sepherosa Ziehau <sephe@dragonflybsd.org>	socket: Add KTR_SOWAKEUP Define 2 pairs of children nodes for this KTR, which are used to tracking extra IPIs for accept(2). Note: The tracked sorwakeup() and the wakeup(so_timeo) does not generate socket: Add KTR_SOWAKEUP Define 2 pairs of children nodes for this KTR, which are used to tracking extra IPIs for accept(2). Note: The tracked sorwakeup() and the wakeup(so_timeo) does not generate extra (wakeup) IPIs. show more ...
Revision tags: v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0, v3.8.2
# a5e93826	18-Jul-2014	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Adjust ssb_space_prealloc() use cases * Add two flags to the signalsockbuf ssb_flags field. SSB_PREALLOC - Indicates that data preallocation tracking is being used SSB_STOPSUPP - Indi kernel - Adjust ssb_space_prealloc() use cases * Add two flags to the signalsockbuf ssb_flags field. SSB_PREALLOC - Indicates that data preallocation tracking is being used SSB_STOPSUPP - Indicates that SSB_STOP flow control is being used * unix domain sockets set SSB_STOPSUPP, tcp and sctp sockets set SSB_PREALLOC. * sendfile() requires that either SSB_PREALLOC or SSB_STOPSUPP be specified. * Code now conditionalizes the use of ssb_space() vs ssb_space_prealloc() based on the presence of the SSB_PREALLOC flag. Reported-by: sephe show more ...
# b210f45e	18-Jul-2014	Matthew Dillon <dillon@apollo.backplane.com>	kernel - network adjustments (netisr, tcp, and socket buffer changes) * Change sowakeup() to use an atomic fetch when testing WAIT/WAKEUP for a quick return. It is now coded properly. Previous c kernel - network adjustments (netisr, tcp, and socket buffer changes) * Change sowakeup() to use an atomic fetch when testing WAIT/WAKEUP for a quick return. It is now coded properly. Previous coding is not known to have created any bugs. * Change sowakeup() to use ssb_space_prealloc() instead of ssb_space() when testing against the transmit low-water mark. This is a bug fix which primarily effects very tiny write()'s. The prior code is not known to have created any problems. * Make the netisr packet counter before doing a rollup programmer and change the default from 512 to 32 for the moment. This may be changed back to 512 (or some number inbetween) after further testing. The issue here is that interrupt/netisr pipelining can cause ack aggregation to be delayed for too many packets. * For TCP, when timestamps are not being used, pass the correct delta to tcp_xmit_timer() in our fallback. The function expects N+1. This should improve/fix incorrect rtt calculations when tcp timestamps are not in use. * Fix an edge case in tcp_xmit_bandwidth_limit() where the 'ticks' global could change values out from under the code. Load the global into a local variable. * Change the inflight code to use (t_srtt + t_rttvar) instead of (t_srtt + t_rttbest) / 2. This needs fine-tuning, the buffer is still too big. Expect more commits later. * Call sowwakeup() when appending a mbuf to a stream. The append can call sbcompress() and make a stream buffer that has hit its mbuf limit writable again. * Remove the ssb_notify() macro and collapse the sorwakeup() and sowwakeup() macros. They now just call sowakeup() on the appropriate sockbuf. The notify test is now done in sowakeup(). show more ...
# 11b81f5d	16-Jul-2014	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Improve TCP socket handling at high speeds * Add M_SOLOCKED to mbuf->m_flags. This flag prevents sbcompress() from collapsing more data into a mbuf. * Rewrite sorecvtcp() (NOTE: sorecei kernel - Improve TCP socket handling at high speeds * Add M_SOLOCKED to mbuf->m_flags. This flag prevents sbcompress() from collapsing more data into a mbuf. * Rewrite sorecvtcp() (NOTE: soreceive() could use similar treatment). Use M_SOLOCKED to freeze mbufs in the sockbuf with the rcvtok held, then do the uiomove() loop WITHOUT the rcvtok held, then finalize the disposal of the mbufs with rcvtok held. This greatly reduces contention on rcvtok against the netisr threads when reading large amounts of data at once and reduces cpu overhead for netisr and user network threads. * Change the default transmit ssb_lowat from ssb_hiwat / 2 to ssb_hiwat / 4. The (previous) default maximum socket buffer size was 256KB. The default lowat reduced the effective TCP transmit window to ~100KB. This can cause severe buffering issues on GiGE links when multiple TCP streams are being routed to the same cpu. With this change the default max send buffer is ~180KB or so. * Change the default kern.ipc.maxsockbuf from 256KB to 512KB. This primarily effects auto-sizing of tcp buffers which in turn effects most TCP connections. This coupled with the hiwat fix greatly improves transmit throughput. * Add more debugging info to the tcp inflight code. show more ...
# fd27efb4	18-Jun-2014	Sepherosa Ziehau <sephe@dragonflybsd.org>	socket: {soabort,so_pru_abort}a -> {soabort,so_pru_abort}_async No functional change. They are consistent w/ other so and so_pru function names.
Revision tags: v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc, v3.6.2, v3.6.1, v3.6.0, v3.7.1, v3.6.0rc, v3.7.0, v3.4.3
# dc71b7ab	31-May-2013	Justin C. Sherrill <justin@shiningsilence.com>	Correct BSD License clause numbering from 1-2-4 to 1-2-3. Apparently everyone's doing it: http://svnweb.freebsd.org/base?view=revision&revision=251069 Submitted-by: "Eitan Adler" <lists at eitanadl Correct BSD License clause numbering from 1-2-4 to 1-2-3. Apparently everyone's doing it: http://svnweb.freebsd.org/base?view=revision&revision=251069 Submitted-by: "Eitan Adler" <lists at eitanadler.com> show more ...
Revision tags: v3.4.2
# 2702099d	06-May-2013	Justin C. Sherrill <justin@shiningsilence.com>	Remove advertising clause from all that isn't contrib or userland bin. By: Eitan Adler <lists@eitanadler.com>
# 5337421c	02-May-2013	Sepherosa Ziehau <sephe@dragonflybsd.org>	netisr: Inline netisr_cpuport() and netisr_curport() These two functions do nothing more than just return pointer to the element in the array. Per our header file naming convention, put these two f netisr: Inline netisr_cpuport() and netisr_curport() These two functions do nothing more than just return pointer to the element in the array. Per our header file naming convention, put these two functions in net/netisr2.h show more ...
# ec7f7fc8	28-Apr-2013	Sepherosa Ziehau <sephe@dragonflybsd.org>	netisr: Function renaming; no functional changes This cleans up code for keeping input packets' hash instead of masking the hash with ncpus2_mask. netisr_hashport(), which maps packet hash to netis netisr: Function renaming; no functional changes This cleans up code for keeping input packets' hash instead of masking the hash with ncpus2_mask. netisr_hashport(), which maps packet hash to netisr port, will be added soon. show more ...
Revision tags: v3.4.0, v3.4.1, v3.4.0rc, v3.5.0
# 901b9bd6	19-Mar-2013	Sepherosa Ziehau <sephe@dragonflybsd.org>	async_rcvd: Don't add/drop socket reference on hot path Instead, add reference in tcp_attach(), and drop the reference in tcp_close()
Revision tags: v3.2.2, v3.2.1, v3.2.0, v3.3.0
# 857e4745	14-Sep-2012	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Fix unix domain socket portfn routing * sonewconn_faddr() / sonewconn() was improperly overriding the sync_port setting for unix domain sockets, causing unnecessary netmsg traffic to th kernel - Fix unix domain socket portfn routing * sonewconn_faddr() / sonewconn() was improperly overriding the sync_port setting for unix domain sockets, causing unnecessary netmsg traffic to the netisr threads. * This should significantly improve unix domain socket performance. With-help-from: sephe show more ...
# 3abced87	11-Sep-2012	Nuno Antunes <nuno.antunes@gmail.com>	netisr: rename cpu_portfn() to netisr_portfn(). No functional change. Searched and replaced with: find sys/ -type f -exec sed -i "" 's/cpu_portfn/netisr_portfn/g' '{}' \;
# 96c6eb29	03-Sep-2012	Sepherosa Ziehau <sephe@dragonflybsd.org>	tcp: Implement asynchronized pru_rcvd This mainly avoids extra scheduling cost on the reception path due to lwkt_domsg(). lwkt_sendmsg() is now used to carry out TCP pru_rcvd. Since TCP's pru_rcvd tcp: Implement asynchronized pru_rcvd This mainly avoids extra scheduling cost on the reception path due to lwkt_domsg(). lwkt_sendmsg() is now used to carry out TCP pru_rcvd. Since TCP's pru_rcvd could be batched, one pru_rcvd netmsg is embedded into struct socket to avoid pru_rcvd netmsg allocation for each pru_rcvd, and this netmsg will be used by lwkt_sendmsg(). Whether this embedded pcu_rcvd netmsg should be sent or not is determined by its MSG_DONE bit. Since user thread and netisr thread could be on different CPUs, the embedded pru_rcvd netmsg's MSG_DONE bit is protected by a spinlock. To cope with the following race that could drop window updates, tcp_usr_rcvd() replies asynchronized rcvd netmsg before tcp_output(): netisr thread user thread tcp_usr_rcvd() sorcvtcp() { { tcp_output() : : : : sbunlinkmbuf() : if (rcvd & MSG_DONE) (2) : lwkt_sendmsg(rvcd) : : lwkt_replymsg(rcvd) (1) } At (2) window update is dropped, since rcvd netmsg is not replied yet at (1) The result: On i7-2600 (4C/8T, 3.4GHz): 32 parallel netperf -H 127.0.0.1 -t TCP_STREAM -P0 -l 30 (4 runs, unit: Mbps) old 30253.88 30242.58 30162.55 30101.51 new 33962.74 33798.70 33499.92 33482.35 This gives ~12% performance improvement. show more ...
Revision tags: v3.0.3, v3.0.2, v3.0.1, v3.1.0, v3.0.0
# 88da6203	29-Nov-2011	Sepherosa Ziehau <sephe@dragonflybsd.org>	accept: Save foreign address earlier, if protocol supports it - Add so_faddr into socket, which records the accepted socket's foreign address. If it is set, kern_accept() will use it directly ins accept: Save foreign address earlier, if protocol supports it - Add so_faddr into socket, which records the accepted socket's foreign address. If it is set, kern_accept() will use it directly instead of calling protocol specific method to extract the foreign address. - Add protocol specific method, pru_safefaddr, which will save the foreign address into socket.so_faddr if the necessary information is supplied. This protocol method will only be called in protocol thread. - Pass the foreign address to sonewconn() if possible, so the foreign address could be saved before the accepted socket is put onto the complete list. Currently only IPv4/TCP implemented pru_savefaddr This intends to address the following problems: - Calling pru_accept directly from user context is not MPSAFE, we always races the socket.so_pcb check->use against protocol thread clear/free socket.so_pcb, though the race window is too tiny to be hit. To make it mpsafe, we should dispatch pru_accept to protocol thread. If socket.so_faddr is set here, we are race against nothing and nothing expensive like put the current user thread into sleep will happen. However, if the socket is dropped when it still sits on the complete list, the error will not be timely delivered, i.e. accept(2) will not return error, but the later on read(2)/write(2) on the socket will deliver the error. - Calling pru_accept directly races against the inpcb.inp_f{addr,port} setting up in the protocol thread, since inpcb.inp_f{addr,port} is setup _after_ the accepted socket was put onto the complete list. user thread proto thread : : : accepted socket -> comp : (inpcb.inp_f{addr,port} are 0 here) comp -> socket : pru_accept : : setup inpcb.inp_f{addr,port} Returning of 0.0.0.0:0 from accept(2) was observed on heavily loaded web servers. show more ...
# 86d7f5d3	26-Nov-2011	John Marino <draco@marino.st>	Initial import of binutils 2.22 on the new vendor branch Future versions of binutils will also reside on this branch rather than continuing to create new binutils branches for each new version.
# 15d2bc79	16-Nov-2011	Sepherosa Ziehau <sephe@dragonflybsd.org>	socket: Properly inherit AUTOLOWAT and AUTOSIZE from listen socket The soreserve and pru_attach could set these two flags internally, so the original code will only retain those two flags but not cl socket: Properly inherit AUTOLOWAT and AUTOSIZE from listen socket The soreserve and pru_attach could set these two flags internally, so the original code will only retain those two flags but not clear them if the listen socket does not have them. We now explicitly check those two flags and then set or clear them accordingly. show more ...
12 3 4