History log of /dragonfly/sys/kern/uipc_socket2.c (Results 1 – 25 of 79)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.2.1, v6.2.0, v6.3.0, v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3, v5.6.2
# 7eaeff3d 07-Aug-2019 Roy Marples <roy@marples.name>

socket: introduce SO_RERROR to detect receive buffer overflow

kernel receive buffers are initially of a limited size and
generally the network protocols that use them don't care
if a packet gets los

socket: introduce SO_RERROR to detect receive buffer overflow

kernel receive buffers are initially of a limited size and
generally the network protocols that use them don't care
if a packet gets lost.

However some users do care about lost messages even if not
baked into the protocol - such as consumers of route(4) to
track state.

POSIX states that read(2) can return an error of ENOBUFS so
return this error code when an overflow is detected.
Guard this with socket option SO_RERROR so that existing
applications which do not care can carry on not caring by
default.

Taken-from: NetBSD
Reviewed-by: sephe

show more ...


Revision tags: v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3
# 2d5847e2 01-May-2019 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add kern.ipc.soaccept_reuse and set default to 1

* This feature, enabled by default, allows a service listening on
a socket to be killed and restarted without causing
"bind: Address alr

kernel - Add kern.ipc.soaccept_reuse and set default to 1

* This feature, enabled by default, allows a service listening on
a socket to be killed and restarted without causing
"bind: Address already in use" errors due to accepted connections still
being present.

* The accepted connections may still be present either because they
are still in active use (though typically this is not the case when
a service is killed... its children also get killed). But also, more
importantly, if the sockets are still present due to lingering on a
TCP timeout.

In both of these situations we allow bind() to ignore matches against
accepted connections. This allows a service to be restared without
having to set SO_REUSEADDR (for example named/bind generally does not
set SO_REUSEADDR and restarting can be a pain).

show more ...


Revision tags: v5.4.2
# fcf6efef 02-Mar-2019 Sascha Wildner <saw@online.de>

kernel: Remove numerous #include <sys/thread2.h>.

Most of them were added when we converted spl*() calls to
crit_enter()/crit_exit(), almost 14 years ago. We can now
remove a good chunk of them agai

kernel: Remove numerous #include <sys/thread2.h>.

Most of them were added when we converted spl*() calls to
crit_enter()/crit_exit(), almost 14 years ago. We can now
remove a good chunk of them again for where crit_*() are
no longer used.

I had to adjust some files that were relying on thread2.h
or headers that it includes coming in via other headers
that it was removed from.

show more ...


Revision tags: v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1, v5.2.0, v5.3.0, v5.2.0rc, v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc, v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0, v4.4.3, v4.4.2
# 20faa324 03-Jan-2016 Sepherosa Ziehau <sephe@dragonflybsd.org>

kqueue: Move notifymsglist out of kqinfo

It is only used by socket code.


Revision tags: v4.4.1, v4.4.0, v4.5.0, v4.4.0rc
# 37e299d5 22-Aug-2015 Sepherosa Ziehau <sephe@dragonflybsd.org>

socket: Allow keeping a reference on the new socket in sonewconn_faddr

It will be used to fix unix socket races.


# e994cde3 21-Aug-2015 Sepherosa Ziehau <sephe@dragonflybsd.org>

socket: Reorder state setting a little bit in sonewconn_faddr()

This prevents several possible races.


# ee39a18e 19-Aug-2015 Sepherosa Ziehau <sephe@dragonflybsd.org>

socket: Assert SS_{INCOMP,COMP} before deq/enq so_{comp,incomp}

Suggested-by: dillon@


Revision tags: v4.2.4, v4.3.1
# 3735a885 30-Jul-2015 Sepherosa Ziehau <sephe@dragonflybsd.org>

socket: Close the soreference() race against socket owner netisr sofree()

The race is kinda like this:

Other thread/netisrN netisrM (so->so_pcb owner)
:

socket: Close the soreference() race against socket owner netisr sofree()

The race is kinda like this:

Other thread/netisrN netisrM (so->so_pcb owner)
: :
getpooltoken(head); :
so->so_head = NULL; :
: sofree(so); (*)
soreference(so); :
relpooltoken(head); :

(*)
sofree(so) frees the socket, since so->so_head is NULL and
getpooltoken(head) is not called.

Reported-by: dillon@

show more ...


Revision tags: v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc, v4.0.5, v4.0.4
# b5523eac 19-Feb-2015 Sascha Wildner <saw@online.de>

kernel: Move us to using M_NOWAIT and M_WAITOK for mbuf functions.

The main reason is that our having to use the MB_WAIT and MB_DONTWAIT
flags was a recurring issue when porting drivers from FreeBSD

kernel: Move us to using M_NOWAIT and M_WAITOK for mbuf functions.

The main reason is that our having to use the MB_WAIT and MB_DONTWAIT
flags was a recurring issue when porting drivers from FreeBSD because
it tended to get forgotten and the code would compile anyway with the
wrong constants. And since MB_WAIT and MB_DONTWAIT ended up as ocflags
for an objcache_get() or objcache_reclaimlist call (which use M_WAITOK
and M_NOWAIT), it was just one big converting back and forth with some
sanitization in between.

This commit allows M_* again for the mbuf functions and keeps the
sanitizing as it was before: when M_WAITOK is among the passed flags,
objcache functions will be called with M_WAITOK and when it is absent,
they will be called with M_NOWAIT. All other flags are scrubbed by the
MB_OCFLAG() macro which does the same as the former MBTOM().

Approved-by: dillon

show more ...


Revision tags: v4.0.3, v4.0.2
# b981a49d 25-Dec-2014 Sepherosa Ziehau <sephe@dragonflybsd.org>

socket: Add KTR_SOWAKEUP

Define 2 pairs of children nodes for this KTR, which are used to tracking
extra IPIs for accept(2).

Note:
The tracked sorwakeup() and the wakeup(so_timeo) does not generate

socket: Add KTR_SOWAKEUP

Define 2 pairs of children nodes for this KTR, which are used to tracking
extra IPIs for accept(2).

Note:
The tracked sorwakeup() and the wakeup(so_timeo) does not generate extra
(wakeup) IPIs.

show more ...


Revision tags: v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0, v3.8.2
# a5e93826 18-Jul-2014 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Adjust ssb_space_prealloc() use cases

* Add two flags to the signalsockbuf ssb_flags field.

SSB_PREALLOC - Indicates that data preallocation tracking is being used
SSB_STOPSUPP - Indi

kernel - Adjust ssb_space_prealloc() use cases

* Add two flags to the signalsockbuf ssb_flags field.

SSB_PREALLOC - Indicates that data preallocation tracking is being used
SSB_STOPSUPP - Indicates that SSB_STOP flow control is being used

* unix domain sockets set SSB_STOPSUPP, tcp and sctp sockets
set SSB_PREALLOC.

* sendfile() requires that either SSB_PREALLOC or SSB_STOPSUPP be specified.

* Code now conditionalizes the use of ssb_space() vs ssb_space_prealloc()
based on the presence of the SSB_PREALLOC flag.

Reported-by: sephe

show more ...


# b210f45e 18-Jul-2014 Matthew Dillon <dillon@apollo.backplane.com>

kernel - network adjustments (netisr, tcp, and socket buffer changes)

* Change sowakeup() to use an atomic fetch when testing WAIT/WAKEUP for
a quick return. It is now coded properly. Previous c

kernel - network adjustments (netisr, tcp, and socket buffer changes)

* Change sowakeup() to use an atomic fetch when testing WAIT/WAKEUP for
a quick return. It is now coded properly. Previous coding is not known
to have created any bugs.

* Change sowakeup() to use ssb_space_prealloc() instead of ssb_space()
when testing against the transmit low-water mark. This is a bug fix
which primarily effects very tiny write()'s. The prior code is not
known to have created any problems.

* Make the netisr packet counter before doing a rollup programmer and
change the default from 512 to 32 for the moment. This may be changed
back to 512 (or some number inbetween) after further testing.

The issue here is that interrupt/netisr pipelining can cause ack aggregation
to be delayed for too many packets.

* For TCP, when timestamps are not being used, pass the correct delta
to tcp_xmit_timer() in our fallback. The function expects N+1. This
should improve/fix incorrect rtt calculations when tcp timestamps are
not in use.

* Fix an edge case in tcp_xmit_bandwidth_limit() where the 'ticks' global
could change values out from under the code. Load the global into a local
variable.

* Change the inflight code to use (t_srtt + t_rttvar) instead of
(t_srtt + t_rttbest) / 2.

This needs fine-tuning, the buffer is still too big. Expect more commits
later.

* Call sowwakeup() when appending a mbuf to a stream. The append can call
sbcompress() and make a stream buffer that has hit its mbuf limit writable
again.

* Remove the ssb_notify() macro and collapse the sorwakeup() and sowwakeup()
macros. They now just call sowakeup() on the appropriate sockbuf. The
notify test is now done in sowakeup().

show more ...


# 11b81f5d 16-Jul-2014 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Improve TCP socket handling at high speeds

* Add M_SOLOCKED to mbuf->m_flags. This flag prevents sbcompress()
from collapsing more data into a mbuf.

* Rewrite sorecvtcp() (NOTE: sorecei

kernel - Improve TCP socket handling at high speeds

* Add M_SOLOCKED to mbuf->m_flags. This flag prevents sbcompress()
from collapsing more data into a mbuf.

* Rewrite sorecvtcp() (NOTE: soreceive() could use similar treatment).
Use M_SOLOCKED to freeze mbufs in the sockbuf with the rcvtok held,
then do the uiomove() loop WITHOUT the rcvtok held, then finalize
the disposal of the mbufs with rcvtok held.

This greatly reduces contention on rcvtok against the netisr threads
when reading large amounts of data at once and reduces cpu overhead
for netisr and user network threads.

* Change the default transmit ssb_lowat from ssb_hiwat / 2 to ssb_hiwat / 4.
The (previous) default maximum socket buffer size was 256KB. The default
lowat reduced the effective TCP transmit window to ~100KB. This can cause
severe buffering issues on GiGE links when multiple TCP streams are being
routed to the same cpu.

With this change the default max send buffer is ~180KB or so.

* Change the default kern.ipc.maxsockbuf from 256KB to 512KB. This
primarily effects auto-sizing of tcp buffers which in turn effects
most TCP connections.

This coupled with the hiwat fix greatly improves transmit throughput.

* Add more debugging info to the tcp inflight code.

show more ...


# fd27efb4 18-Jun-2014 Sepherosa Ziehau <sephe@dragonflybsd.org>

socket: {soabort,so_pru_abort}a -> {soabort,so_pru_abort}_async

No functional change. They are consistent w/ other so and so_pru
function names.


Revision tags: v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc, v3.6.2, v3.6.1, v3.6.0, v3.7.1, v3.6.0rc, v3.7.0, v3.4.3
# dc71b7ab 31-May-2013 Justin C. Sherrill <justin@shiningsilence.com>

Correct BSD License clause numbering from 1-2-4 to 1-2-3.

Apparently everyone's doing it:
http://svnweb.freebsd.org/base?view=revision&revision=251069

Submitted-by: "Eitan Adler" <lists at eitanadl

Correct BSD License clause numbering from 1-2-4 to 1-2-3.

Apparently everyone's doing it:
http://svnweb.freebsd.org/base?view=revision&revision=251069

Submitted-by: "Eitan Adler" <lists at eitanadler.com>

show more ...


Revision tags: v3.4.2
# 2702099d 06-May-2013 Justin C. Sherrill <justin@shiningsilence.com>

Remove advertising clause from all that isn't contrib or userland bin.

By: Eitan Adler <lists@eitanadler.com>


# 5337421c 02-May-2013 Sepherosa Ziehau <sephe@dragonflybsd.org>

netisr: Inline netisr_cpuport() and netisr_curport()

These two functions do nothing more than just return pointer to the
element in the array.

Per our header file naming convention, put these two f

netisr: Inline netisr_cpuport() and netisr_curport()

These two functions do nothing more than just return pointer to the
element in the array.

Per our header file naming convention, put these two functions in
net/netisr2.h

show more ...


# ec7f7fc8 28-Apr-2013 Sepherosa Ziehau <sephe@dragonflybsd.org>

netisr: Function renaming; no functional changes

This cleans up code for keeping input packets' hash instead of masking
the hash with ncpus2_mask. netisr_hashport(), which maps packet hash
to netis

netisr: Function renaming; no functional changes

This cleans up code for keeping input packets' hash instead of masking
the hash with ncpus2_mask. netisr_hashport(), which maps packet hash
to netisr port, will be added soon.

show more ...


Revision tags: v3.4.0, v3.4.1, v3.4.0rc, v3.5.0
# 901b9bd6 19-Mar-2013 Sepherosa Ziehau <sephe@dragonflybsd.org>

async_rcvd: Don't add/drop socket reference on hot path

Instead, add reference in tcp_attach(), and drop the reference in
tcp_close()


Revision tags: v3.2.2, v3.2.1, v3.2.0, v3.3.0
# 857e4745 14-Sep-2012 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix unix domain socket portfn routing

* sonewconn_faddr() / sonewconn() was improperly overriding the sync_port
setting for unix domain sockets, causing unnecessary netmsg traffic to
th

kernel - Fix unix domain socket portfn routing

* sonewconn_faddr() / sonewconn() was improperly overriding the sync_port
setting for unix domain sockets, causing unnecessary netmsg traffic to
the netisr threads.

* This should significantly improve unix domain socket performance.

With-help-from: sephe

show more ...


# 3abced87 11-Sep-2012 Nuno Antunes <nuno.antunes@gmail.com>

netisr: rename cpu_portfn() to netisr_portfn().

No functional change.

Searched and replaced with:
find sys/ -type f -exec sed -i "" 's/cpu_portfn/netisr_portfn/g' '{}' \;


# 96c6eb29 03-Sep-2012 Sepherosa Ziehau <sephe@dragonflybsd.org>

tcp: Implement asynchronized pru_rcvd

This mainly avoids extra scheduling cost on the reception path due to
lwkt_domsg(). lwkt_sendmsg() is now used to carry out TCP pru_rcvd.

Since TCP's pru_rcvd

tcp: Implement asynchronized pru_rcvd

This mainly avoids extra scheduling cost on the reception path due to
lwkt_domsg(). lwkt_sendmsg() is now used to carry out TCP pru_rcvd.

Since TCP's pru_rcvd could be batched, one pru_rcvd netmsg is embedded
into struct socket to avoid pru_rcvd netmsg allocation for each pru_rcvd,
and this netmsg will be used by lwkt_sendmsg(). Whether this embedded
pcu_rcvd netmsg should be sent or not is determined by its MSG_DONE bit.
Since user thread and netisr thread could be on different CPUs, the
embedded pru_rcvd netmsg's MSG_DONE bit is protected by a spinlock.

To cope with the following race that could drop window updates,
tcp_usr_rcvd() replies asynchronized rcvd netmsg before tcp_output():

netisr thread user thread

tcp_usr_rcvd() sorcvtcp()
{ {
tcp_output() :
: :
: sbunlinkmbuf()
: if (rcvd & MSG_DONE) (2)
: lwkt_sendmsg(rvcd)
: :
lwkt_replymsg(rcvd) (1)
}

At (2) window update is dropped, since rcvd netmsg is not replied yet at (1)

The result:
On i7-2600 (4C/8T, 3.4GHz):
32 parallel netperf -H 127.0.0.1 -t TCP_STREAM -P0 -l 30 (4 runs, unit: Mbps)

old 30253.88 30242.58 30162.55 30101.51
new 33962.74 33798.70 33499.92 33482.35

This gives ~12% performance improvement.

show more ...


Revision tags: v3.0.3, v3.0.2, v3.0.1, v3.1.0, v3.0.0
# 88da6203 29-Nov-2011 Sepherosa Ziehau <sephe@dragonflybsd.org>

accept: Save foreign address earlier, if protocol supports it

- Add so_faddr into socket, which records the accepted socket's foreign
address. If it is set, kern_accept() will use it directly ins

accept: Save foreign address earlier, if protocol supports it

- Add so_faddr into socket, which records the accepted socket's foreign
address. If it is set, kern_accept() will use it directly instead of
calling protocol specific method to extract the foreign address.
- Add protocol specific method, pru_safefaddr, which will save the
foreign address into socket.so_faddr if the necessary information is
supplied. This protocol method will only be called in protocol
thread.
- Pass the foreign address to sonewconn() if possible, so the foreign
address could be saved before the accepted socket is put onto the
complete list.

Currently only IPv4/TCP implemented pru_savefaddr

This intends to address the following problems:
- Calling pru_accept directly from user context is not MPSAFE, we
always races the socket.so_pcb check->use against protocol thread
clear/free socket.so_pcb, though the race window is too tiny to
be hit. To make it mpsafe, we should dispatch pru_accept to
protocol thread.
If socket.so_faddr is set here, we are race against nothing and
nothing expensive like put the current user thread into sleep will
happen. However, if the socket is dropped when it still sits
on the complete list, the error will not be timely delivered, i.e.
accept(2) will not return error, but the later on read(2)/write(2)
on the socket will deliver the error.
- Calling pru_accept directly races against the inpcb.inp_f{addr,port}
setting up in the protocol thread, since inpcb.inp_f{addr,port} is
setup _after_ the accepted socket was put onto the complete list.

user thread proto thread
: :
: accepted socket -> comp
: (inpcb.inp_f{addr,port} are 0 here)
comp -> socket :
pru_accept :
: setup inpcb.inp_f{addr,port}

Returning of 0.0.0.0:0 from accept(2) was observed on heavily loaded
web servers.

show more ...


# 86d7f5d3 26-Nov-2011 John Marino <draco@marino.st>

Initial import of binutils 2.22 on the new vendor branch

Future versions of binutils will also reside on this branch rather
than continuing to create new binutils branches for each new version.


# 15d2bc79 16-Nov-2011 Sepherosa Ziehau <sephe@dragonflybsd.org>

socket: Properly inherit AUTOLOWAT and AUTOSIZE from listen socket

The soreserve and pru_attach could set these two flags internally,
so the original code will only retain those two flags but not cl

socket: Properly inherit AUTOLOWAT and AUTOSIZE from listen socket

The soreserve and pru_attach could set these two flags internally,
so the original code will only retain those two flags but not clear
them if the listen socket does not have them. We now explicitly
check those two flags and then set or clear them accordingly.

show more ...


1234