uipc_mbuf.c - OpenGrok history log for /openbsd/sys/kern/uipc

Revision	Date	Author	Comments
# 0bf85d46	05-Mar-2024	bluhm <bluhm@openbsd.org>	Revert m_defrag() mbuf alignment to IP header. m_defrag() is intended as last resort to make DMA transfers to the hardware. Therefore page alingment is more important than IP header alignment. The Revert m_defrag() mbuf alignment to IP header. m_defrag() is intended as last resort to make DMA transfers to the hardware. Therefore page alingment is more important than IP header alignment. The reason, why the mbuf returned by m_defrag() was switched to IP header alingment, was that ether_extract_headers() failed in em(4) driver with TSO on sparc64. This has been fixed by using memcpy(). The alignment change in m_defrag() is too late in the 7.5 relaese process. It may affect several drivers on different architectures. Bus dmamap for ixl(4) on sun4v expects page alignment. Such alignment issues and TSO mbuf mapping for IOMMU need more thought. OK deraadt@ show more ...
# e78015d9	21-Feb-2024	claudio <claudio@openbsd.org>	Keep mbuf data alignment intact in m_defrag() The recent TSO support in em(4) triggered an alignment error on the TCP header. In em(4) m_defrag() is called before setting up the TSO dma bits and wit Keep mbuf data alignment intact in m_defrag() The recent TSO support in em(4) triggered an alignment error on the TCP header. In em(4) m_defrag() is called before setting up the TSO dma bits and with that the TCP header was suddenly no longer aligned. Like other mbuf functions preserve the data alignment in m_defrag() to prevent such unaligned packets. With help and OK bluhm@ mglocker@ show more ...
# 7b4d35e0	20-Oct-2023	bluhm <bluhm@openbsd.org>	Avoid assertion failure when splitting mbuf cluster. m_split() calls m_align() to initialize the data pointer of newly allocated mbuf. If the new mbuf will be converted to a cluster, this is not ne Avoid assertion failure when splitting mbuf cluster. m_split() calls m_align() to initialize the data pointer of newly allocated mbuf. If the new mbuf will be converted to a cluster, this is not necessary. If additionally the new mbuf is larger than MLEN, this can lead to a panic. Only call m_align() when a valid m_data is needed. This is the case if we do not refecence the existing cluster, but memcpy() the data into the new mbuf. Reported-by: syzbot+0e6817f5877926f0e96a@syzkaller.appspotmail.com OK claudio@ deraadt@ show more ...
# 6d14abdc	23-Jun-2023	gnezdo <gnezdo@openbsd.org>	Avoid division by 0 in m_pool_used OK dlg@ Reported-by: syzbot+a377d5cd833c2343429a@syzkaller.appspotmail.com
# 1d2d8e40	16-May-2023	mvs <mvs@openbsd.org>	Always set maximum queue length to passed in the IFQCTL_MAXLEN case. This is not the fast path, so dropping mq->mq_maxlen check doesn't introduce any performance impact, but makes code MP consistent. Always set maximum queue length to passed in the IFQCTL_MAXLEN case. This is not the fast path, so dropping mq->mq_maxlen check doesn't introduce any performance impact, but makes code MP consistent. Discussed with and ok from bluhm@ show more ...
# 16d357f8	05-May-2023	bluhm <bluhm@openbsd.org>	The mbuf_queue API allows read access to integer variables which another CPU may change simultaneously. To prevent miss optimisation by the compiler, they need the READ_ONCE() macro. Otherwise ther The mbuf_queue API allows read access to integer variables which another CPU may change simultaneously. To prevent miss optimisation by the compiler, they need the READ_ONCE() macro. Otherwise there could be two read operations with inconsistent values. Writing to integer in mq_set_maxlen() needs mutex protection. Otherwise the value could change within critical sections. Again the compiler could optimize to multiple read operations within the critical section. With inconsistent values, the behavior is undefined. OK dlg@ show more ...
# 0d280c5f	14-Aug-2022	jsg <jsg@openbsd.org>	remove unneeded includes in sys/kern ok mpi@ miod@
# 7eb8d89d	22-Feb-2022	guenther <guenther@openbsd.org>	Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h> net/if_pppx.c pointed out by jsg@ ok gnezdo@ deraadt@ jsg@ mpi@ millert@
# c8502062	14-Feb-2022	dlg <dlg@openbsd.org>	update sbchecklowmem() to better detect actual mbuf memory usage. previously sbchecklowmem() (and sonewconn()) would look at the mbuf and mbuf cluster pools to see if they were approaching their har update sbchecklowmem() to better detect actual mbuf memory usage. previously sbchecklowmem() (and sonewconn()) would look at the mbuf and mbuf cluster pools to see if they were approaching their hard limits. based on how many mbufs/clusters were allocated against the limits, socket operations would start to fail with ENOBUFS until utilisation went down. mbufs and clusters have changed a lot since then though. there are now many mbuf cluster pools, not just one for 2k clusters. because of this the mbuf layer now limits the amount of memory all the mbuf pools can allocate backend pages from rather than limit the individual pools. this means sbchecklowmem() ends up looking at the default pool hard limit, which is UINT_MAX, which in turn means means sbchecklowmem() probably never applies backpressure. this is made worse on multiprocessor systems where per cpu caches of mbuf and cluster pool items are enabled because the number of in use pool items is distorted by the cpu caches. this switches sbchecklowmem to looking at the page allocations made by all the pools instead. the big benefit of this is that the page allocations are much more representative of the overall mbuf memory usage in the system. the downside is is that the backend page allocation accounting does not see idle memory held by pools. pools cannot release partially free pages to the page backend (obviously), and pools cache idle items to avoid thrashing on the backend page allocator. this means the page allocation level is higher than the memory used by actual in-flight mbufs. however, this can also be a benefit. the backend page allocation is a kind of smoothed out "trend" line. mbuf utilisation over short periods can be extremely bursty because of things like rx ring dequeue and fill cycles, or large socket sends. if you're trying to grow socket buffers while these things are happening, luck becomes an important factor in whether it will work or not. because pools cache idle items, the backend page utilisation better represents the overall trend of activity in the system and will give more consistent behaviour here. this diff is deliberately simple. we're basically going from "no limits" to "some sort of limit" for sockets again, so keeping the code simple means it should be easy to understand and tweak in the future. ok djm@ visa@ claudio@ show more ...
# 75fb34f8	08-Feb-2022	dlg <dlg@openbsd.org>	use sizeof(long) - 1 in m_pullup to determine payload alignment. this makes it consistent with the rest of the network stack when determining alignment. ok bluhm@
# 2dffa172	18-Jan-2022	bluhm <bluhm@openbsd.org>	Properly handle read-only clusters in m_pullup(9). If the first mbuf of a chain in m_pullup is a cluster, check if the cluster is read-only (shared or an external buffer). If so, don't touch it and Properly handle read-only clusters in m_pullup(9). If the first mbuf of a chain in m_pullup is a cluster, check if the cluster is read-only (shared or an external buffer). If so, don't touch it and create a new mbuf for the pullup data. This restores original 4.4BSD m_pullup, that not only returned contiguous mbuf data of the specified length, but also converted read-only clusters into writeable memory. The latter feature was lost during some refactoring. from ehrhardt@; tested by weerd@; OK stsp@ bluhm@ claudio@ show more ...
# 0bca52fc	06-Mar-2021	jsg <jsg@openbsd.org>	ansi
# bbe404a3	25-Feb-2021	dlg <dlg@openbsd.org>	let m_copydata use a void * instead of caddr_t i'm not a fan of having to cast to caddr_t when we have modern inventions like void *s we can take advantage of. ok claudio@ mvs@ bluhm@
# 1284ddab	13-Jan-2021	bluhm <bluhm@openbsd.org>	Convert mbuf type KDASSERT() to a proper KASSERT() in m_get(9). Should prevent to use uninitialized value as bogus counter index. OK mvs@ claudio@ anton@
# 471f2571	12-Dec-2020	jan <jan@openbsd.org>	Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp. OK dlg@, bluhm@ No Opinion mpi@ Not against it claudio@
# 6281916f	21-Jun-2020	dlg <dlg@openbsd.org>	add mq_push. it's like mq_enqueue, but drops from the head, not the tail. from Matt Dunwoodie and Jason A. Donenfeld
# 50ec4b0e	22-Jan-2020	dlg <dlg@openbsd.org>	add ml_hdatalen and mq_hdatalen as workalikes of ifq_hdatalen. this is so pppx(4) and the upcoming pppac(4) can give kq read data dn FIONREAD values that makes sense like the ones tun(4) and tap(4) add ml_hdatalen and mq_hdatalen as workalikes of ifq_hdatalen. this is so pppx(4) and the upcoming pppac(4) can give kq read data dn FIONREAD values that makes sense like the ones tun(4) and tap(4) provide with ifq_hdatalen. show more ...
# 08a1c293	22-Oct-2019	bluhm <bluhm@openbsd.org>	Replace the mutex that protects the mbuf allocation limit by an atomic operation. OK visa@ cheloha@
# 4286c7cf	19-Jul-2019	bluhm <bluhm@openbsd.org>	After the kernel has reached the sysclt kern.maxclusters limit, operations get stuck while holding the net lock. Increasing the limit did not help as there was no wakeup of the waiting pools. So in After the kernel has reached the sysclt kern.maxclusters limit, operations get stuck while holding the net lock. Increasing the limit did not help as there was no wakeup of the waiting pools. So introduce pool_wakeup() and run through the mbuf pool request list when the limit changes. OK dlg@ visa@ show more ...
# bb127fc6	16-Jul-2019	bluhm <bluhm@openbsd.org>	Fix uipc white spaces.
# 9392a735	16-Jul-2019	bluhm <bluhm@openbsd.org>	Prevent integer overflow in kernel and userland when checking mbuf limits. Convert kernel variables and calculations for mbuf memory into long to allow larger values on 64 bit machines. Put a range Prevent integer overflow in kernel and userland when checking mbuf limits. Convert kernel variables and calculations for mbuf memory into long to allow larger values on 64 bit machines. Put a range check into the kernel sysctl. For the interface itself int is still sufficient. In netstat -m cast all multiplications to unsigned long to hold the product of two unsigned int. input and OK visa@ show more ...
# 5bac5b4f	10-Jun-2019	dlg <dlg@openbsd.org>	add m_microtime for getting the wall clock time associated with a packet if the packet has the M_TIMESTAMP csum_flag, ph_timestamp is added to the boottime clock, otherwise it just uses microtime().
# 4bfbad54	10-Feb-2019	tedu <tedu@openbsd.org>	revert revert revert. there are many other archs that use custom allocs.
# 079cc439	10-Feb-2019	tedu <tedu@openbsd.org>	make it possible to reduce kmem pressure by letting some pools use a more accomodating allocator. an interrupt safe pool may also be used in process context, as indicated by waitok flags. thanks to t make it possible to reduce kmem pressure by letting some pools use a more accomodating allocator. an interrupt safe pool may also be used in process context, as indicated by waitok flags. thanks to the garbage collector, we can always free pages in process context. the only complication is where to put the pages. solve this by saving the allocation flags in the pool page header so the free function can examine them. not actually used in this diff. (coming soon.) arm testing and compile fixes from phessler show more ...
# 895c84ff	01-Feb-2019	dlg <dlg@openbsd.org>	make m_pullup use the first mbuf with data to measure alignment. this fixes an issue found by a regress test on sparc64 by claudio, and between us took about half a day of work to understand and fix make m_pullup use the first mbuf with data to measure alignment. this fixes an issue found by a regress test on sparc64 by claudio, and between us took about half a day of work to understand and fix at a2k19. ok claudio@ show more ...
12 3 4 5 6 7 8 9 10 >>...12