slaballoc.h - OpenGrok history log for /dragonfly/sys/sys/slaballoc.h

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: v6.2.1, v6.2.0, v6.3.0, v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3
# 9d4f17d1	10-Nov-2019	zrj <rimvydas.jasinskas@gmail.com>	Adjust headers for <machine/stdint.h> visibility. This also reduces namespace pollution a bit. Include <machine/stdint.h> where <stdint.h> is used too. External compiler under -ffreestanding (_ Adjust headers for <machine/stdint.h> visibility. This also reduces namespace pollution a bit. Include <machine/stdint.h> where <stdint.h> is used too. External compiler under -ffreestanding (__STDC_HOSTED__ == 0) will use their own <stdint.h> version and will not include <machine/stdint.h>. show more ...
# e2164e29	18-Oct-2019	zrj <rimvydas.jasinskas@gmail.com>	<sys/slaballoc.h>: Switch to lighter <sys/_malloc.h> header. The <sys/globaldata.h> embeds SLGlobalData that in turn embeds the "struct malloc_type". Adjust several kernel sources for missing in <sys/slaballoc.h>: Switch to lighter <sys/_malloc.h> header. The <sys/globaldata.h> embeds SLGlobalData that in turn embeds the "struct malloc_type". Adjust several kernel sources for missing includes where memory allocation is performed. Try to use alphabetical include order. Now (in most cases) <sys/malloc.h> is included after <sys/objcache.h>. Once it gets cleaned up, the <sys/malloc.h> inclusion could be moved out of <sys/idr.h> to drm Linux compat layer linux/slab.h without side effects. show more ...
# bce6845a	18-Oct-2019	zrj <rimvydas.jasinskas@gmail.com>	kernel: Minor whitespace cleanup in few sources.
Revision tags: v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1, v5.2.0, v5.3.0, v5.2.0rc, v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc, v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0, v4.4.3, v4.4.2, v4.4.1, v4.4.0, v4.5.0, v4.4.0rc
# c1b91053	11-Aug-2015	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Reduce slab allocator fragmentation * Restores the intent of the original z_Next test removed by the last commit and adjusts the related code comments. This allows fully free zones at kernel - Reduce slab allocator fragmentation * Restores the intent of the original z_Next test removed by the last commit and adjusts the related code comments. This allows fully free zones at the head to be moved to the free list as long as other slabs are present for the chunking. * Switch the zone management from LIST to TAILQ so we can manipulate the tail of the list. * Define the head of a zone list as being more 'mature' zones, potentially freeable by any code which tends to cycle allocations. The tail of the zone list is less mature and subject to reuse more quickly. - The allocator allocates from the tail (least mature). - Fully free zones are moved to the head (most mature). - First free of a fully allocated zone relists the zone at the head (the zone is considered mature). - Additional frees do not move the zone. * TODO - We could also possibly shift the zone within the list based on NFree vs the NFree of adjacent zones, in order to heuristically allocate from the least-free zones and give the most-free zones a better chance to become fully free. Reported-by: Adrian Drzewiecki <z@drze.net> show more ...
# c06ca5ee	10-Aug-2015	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Remove unused z_Next field, fix comments & debug helper * The kernel slab allocator no longer uses the z_Next field, remove it. * Remove a useless z_Next test in two places and adjust the kernel - Remove unused z_Next field, fix comments & debug helper * The kernel slab allocator no longer uses the z_Next field, remove it. * Remove a useless z_Next test in two places and adjust the comments to describe the actual operation of the zone free case. It doesn't hurt for us to leave one fully free zone structure on the main per-cpu ZoneAry[] for each zone, there might even be a cache-locality-of-reference advantage. * Fix the 'zoneinfo' code in test/debug so it properly reports the kernel slab allocators topology. Reported-by: Adrian Drzewiecki <z@drze.net> show more ...
Revision tags: v4.2.4, v4.3.1, v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc, v4.0.5, v4.0.4, v4.0.3, v4.0.2, v4.0.1
# 243dbb26	19-Nov-2014	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Improve slab cleanup performance * Convert ZoneAry[], FreeZones, and FreeOVZones from singly linked lists to doubly linked LISTs. SLZone structure changes size but globaldata should st kernel - Improve slab cleanup performance * Convert ZoneAry[], FreeZones, and FreeOVZones from singly linked lists to doubly linked LISTs. SLZone structure changes size but globaldata should stay the same. * Primarily affects slab_cleanup() which appears to be able to eat an excessive amount of cpu on monster (systems with large amounts of memory), and may fix a spin lock timeout panic. * We may need some more work to moderate the amount of time slab_cleanup() takes. show more ...
Revision tags: v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0, v3.8.2, v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc, v3.6.2, v3.6.1, v3.6.0, v3.7.1, v3.6.0rc, v3.7.0, v3.4.3, v3.4.2, v3.4.0, v3.4.1, v3.4.0rc, v3.5.0, v3.2.2, v3.2.1, v3.2.0, v3.3.0, v3.0.3, v3.0.2, v3.0.1, v3.1.0, v3.0.0
# 86d7f5d3	26-Nov-2011	John Marino <draco@marino.st>	Initial import of binutils 2.22 on the new vendor branch Future versions of binutils will also reside on this branch rather than continuing to create new binutils branches for each new version.
Revision tags: v2.12.0, v2.13.0, v2.10.1, v2.11.0, v2.10.0, v2.9.1, v2.8.2, v2.8.1, v2.8.0, v2.9.0, v2.6.3, v2.7.3, v2.6.2, v2.7.2, v2.7.1, v2.6.1, v2.7.0, v2.6.0, v2.5.1, v2.4.1, v2.5.0, v2.4.0, v2.3.2, v2.3.1, v2.2.1, v2.2.0, v2.3.0, v2.1.1, v2.0.1
# 10cc6608	20-Jun-2005	Matthew Dillon <dillon@dragonflybsd.org>	Include a bitmap of allocated entries when built with INVARIANTS. I considered making this a separate option but decided it was too important to leave out of a basic INVARIANTS build. The kernel wi Include a bitmap of allocated entries when built with INVARIANTS. I considered making this a separate option but decided it was too important to leave out of a basic INVARIANTS build. The kernel will panic if it tries to allocate memory that has already been allocated or free memory that has already been freed. show more ...
# 8c10bfcf	16-Jul-2004	Matthew Dillon <dillon@dragonflybsd.org>	Update all my personal copyrights to the Dragonfly Standard Copyright.
# 2db3b277	12-Feb-2004	Matthew Dillon <dillon@dragonflybsd.org>	Change lwkt_send_ipiq() and lwkt_wait_ipiq() to take a globaldata_t instead of a cpuid. This is part of an ongoing cleanup to use globaldata_t's to reference other cpus rather then their cpu numbers Change lwkt_send_ipiq() and lwkt_wait_ipiq() to take a globaldata_t instead of a cpuid. This is part of an ongoing cleanup to use globaldata_t's to reference other cpus rather then their cpu numbers, reducing the number of serialized memory indirections required in a number of code paths and making more context available to the target code. show more ...
# 05220613	21-Nov-2003	Matthew Dillon <dillon@dragonflybsd.org>	Do some fairly major include file cleanups to further separate kernelland from userland. * Do not allow userland to include sys/proc.h directly, it must use sys/user.h instead. This is b Do some fairly major include file cleanups to further separate kernelland from userland. * Do not allow userland to include sys/proc.h directly, it must use sys/user.h instead. This is because sys/proc.h has a huge number of kernel header file dependancies. * Do cleanups and work in lwkt_thread.c and lwkt_msgport.c to allow these files to be directly compiled in an upcoming userland thread support library. * sys/lock.h is inappropriately included by a number of third party programs so we can't disallow its inclusion, but do not include any kernel structures unless _KERNEL or _KERNEL_STRUCTURES are defined. * <ufs/ufs/inode.h> is often included by userland to get at the on-disk inode structure. Only include the on-disk components and do not include kernel structural components unless _KERNEL or _KERNEL_STRUCTURES is defined * Various usr.bin programs include sys/proc.h unnecessarily. * The slab allocator has no concept of malloc buckets. Remove malloc buckets structures and VMSTAT support from the system. * Make adjustments to sys/thread.h and sys/msgport.h such that the upcoming userland thread support library can include these files directly rather then copy them. * Use low level __int types in sys/globaldata.h, sys/msgport.h, sys/slaballoc.h, sys/thread.h, and sys/malloc.h, instead of high level sys/types.h types, reducing include dependancies. show more ...
# 46a3f46d	02-Oct-2003	Matthew Dillon <dillon@dragonflybsd.org>	Fix a number of interrupt related issues. * Don't access kernel_map in free(), defer such operations to malloc() * Fix a slab allocator panic due to mishandling of malloc size slab limit checks o Fix a number of interrupt related issues. * Don't access kernel_map in free(), defer such operations to malloc() * Fix a slab allocator panic due to mishandling of malloc size slab limit checks on machines with small amounts of memory (the slab allocator reduces the size of the zone on low-memory machines but did not handle the reduced size properly). * Add thread->td_nest_count to prevent splz recursions from underflowing the kernel stack. This can occur because we drop the critical section when calling sched_ithd() in order to allow it to preempt. * Properly adjust intr_nesting_level around FAST interrupts * Adjust the debugging printf() in lockmgr to only complain about blockable lock requests from interrupts. show more ...
# 1c5ca4f3	28-Aug-2003	Matthew Dillon <dillon@dragonflybsd.org>	At Jeffrey Hsu's suggestion (who follows USENIX papers far more closely the I do), change the offset for new allocations out of each new zone we create in order to spread-out L1/L2 cache use. Withou At Jeffrey Hsu's suggestion (who follows USENIX papers far more closely the I do), change the offset for new allocations out of each new zone we create in order to spread-out L1/L2 cache use. Without this new allocations will tend to front-load the cpu caches resulting in non-optimal memory accesses. show more ...
# c7158d94	27-Aug-2003	Matthew Dillon <dillon@dragonflybsd.org>	oops. Forgot a commit.
# a108bf71	27-Aug-2003	Matthew Dillon <dillon@dragonflybsd.org>	SLAB ALLOCATOR Stage 1. This brings in a slab allocator written from scratch by your's truely. A detailed explanation of the allocator is included but first, other changes: * Instead of having vm_ SLAB ALLOCATOR Stage 1. This brings in a slab allocator written from scratch by your's truely. A detailed explanation of the allocator is included but first, other changes: * Instead of having vm_map_entry_insert() and friends allocate the vm_map_entry structures a new mechanism has been emplaced where by the vm_map_entry structures are reserved at a higher level, then expected to exist in the free pool in deep vm_map code. This preliminary implementation may eventually turn into something more sophisticated that includes things like pmap entries and so forth. The idea is to convert what should be low level routines (VM object and map manipulation) back into low level routines. vm_map_entry structure are now per-cpu cached, which is integrated into the the reservation model above. * The zalloc 'kmapentzone' has been removed. We now only have 'mapentzone'. * There were race conditions between vm_map_findspace() and actually entering the map_entry with vm_map_insert(). These have been closed through the vm_map_entry reservation model described above. * Two new kernel config options now work. NO_KMEM_MAP has been fleshed out a bit more and a number of deadlocks related to having only the kernel_map now have been fixed. The USE_SLAB_ALLOCATOR option will cause the kernel to compile-in the slab allocator instead of the original malloc allocator. If you specify USE_SLAB_ALLOCATOR you must also specify NO_KMEM_MAP. * vm_poff_t and vm_paddr_t integer types have been added. These are meant to represent physical addresses and offsets (physical memory might be larger then virtual memory, for example Intel PAE). They are not heavily used yet but the intention is to separate physical representation from virtual representation. SLAB ALLOCATOR FEATURES The slab allocator breaks allocations up into approximately 80 zones based on their size. Each zone has a chunk size (alignment). For example, all allocations in the 1-8 byte range will allocate in chunks of 8 bytes. Each size zone is backed by one or more blocks of memory. The size of these blocks is fixed at ZoneSize, which is calculated at boot time to be between 32K and 128K. The use of a fixed block size allows us to locate the zone header given a memory pointer with a simple masking operation. The slab allocator operates on a per-cpu basis. The cpu that allocates a zone block owns it. free() checks the cpu that owns the zone holding the memory pointer being freed and forwards the request to the appropriate cpu through an asynchronous IPI. This request is not currently optimized but it can theoretically be heavily optimized ('queued') to the point where the overhead becomes inconsequential. As of this commit the malloc_type information is not MP safe, but the core slab allocation and deallocation algorithms, non-inclusive the having to allocate the backing block, ARE MP safe. The core code requires no mutexes or locks, only a critical section. Each zone contains N allocations of a fixed chunk size. For example, a 128K zone can hold approximately 16000 or so 8 byte allocations. The zone is initially zero'd and new allocations are simply allocated linearly out of the zone. When a chunk is freed it is entered into a linked list and the next allocation request will reuse it. The slab allocator heavily optimizes M_ZERO operations at both the page level and the chunk level. The slab allocator maintains various undocumented malloc quirks such as ensuring that small power-of-2 allocations are aligned to their size, and malloc(0) requests are also allowed and return a non-NULL result. kern_tty.c depends heavily on the power-of-2 alignment feature and ahc depends on the malloc(0) feature. Eventually we may remove the malloc(0) feature. PROBLEMS AS OF THIS COMMIT NOTE! This commit may destabilize the kernel a bit. There are issues with the ISA DMA area ('bounce' buffer allocation) due to the large backing block size used by the slab allocator and there are probably some deadlock issues do to the removal of kmem_map that have not yet been resolved. show more ...
# bbb201fd	15-Feb-2011	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Add options SLAB_DEBUG to help debug memory corruption * Adding options SLAB_DEBUG to your kernel config will reconfigure kmalloc(), krealloc(), and kstrdup() to record all allocation s kernel - Add options SLAB_DEBUG to help debug memory corruption * Adding options SLAB_DEBUG to your kernel config will reconfigure kmalloc(), krealloc(), and kstrdup() to record all allocation sources on a zone-by-zone basis, file and line number. A full kernel recompile is needed when you add or drop this option from your kernel config. * Limited to 32 slots per slab. Since slabs offer a narrow range of chunk sizes this will normally be sufficient. * When a memory corruption related panic occurs kgdb can be used to determine who allocated out of the slab in question. show more ...
# df9daea8	30-Sep-2010	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Fix MP race in kmalloc/kfree * Fix two cases where a zone is mis-handled by the new kfree(). Note, however, that the race being fixed is nearly impossible (might even BE impossible) to kernel - Fix MP race in kmalloc/kfree * Fix two cases where a zone is mis-handled by the new kfree(). Note, however, that the race being fixed is nearly impossible (might even BE impossible) to produce because it requires a slab to go from completely empty to completely full through hysteresis and then destroyed all in a few microseconds. Essentially when a kfree() occurs on a cpu which is not the owner of the zone the chunk is linked into a side-list on the zone using atomic ops. Under certain (very rare) circumstances the cpu doing the kfree() must IPI the cpu that owns the zone. The moment the chunk is linked in the cpu owning the zone can race the incoming IPI and destroy the zone (if it is now completely unused). The old kmemusage code handled the race just fine but the new vm_page_t based big-block handler could not. The solution is to have an atomic-ops counter for inflight IPIs which prevents the owning cpu from destroying the zone prematurely. show more ...
# 5fee07e6	18-Sep-2010	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Optimize kfree() to greatly reduce IPI traffic * Instead of IPIing the chunk being freed to the originating cpu we use atomic ops to directly link the chunk onto the target slab. We the kernel - Optimize kfree() to greatly reduce IPI traffic * Instead of IPIing the chunk being freed to the originating cpu we use atomic ops to directly link the chunk onto the target slab. We then notify the target cpu via an IPI message only in the case where we believe the slab has to be entered back onto the target cpu's ZoneAry. This reduces the IPI messaging load by a factor of 100x or more. kfree() sends virtually no IPIs any more. * Move malloc_type accounting to the cpu issuing the kmalloc or kfree (kfree used to forward the accounting to the target cpu). The accounting is done using the per-cpu malloc_type accounting array so large deltas will likely accumulate, but they should all cancel out properly in the summation. * Use the kmemusage array and kup->ku_pagecnt to track whether a SLAB is active or not, which allows the handler for the asynchronous IPI to validate that the SLAB still exists before trying to access it. This is necessary because once the cpu doing the kfree() successfully links the chunk into z_RChunks, the target slab can get ripped out from under it by the owning cpu. * The special cpu-competing linked list is different from the linked list normally used to find free chunks, so the localized code and the MP code is segregated. We pay special attention to list ordering to try to avoid unnecessary cache mastership changes, though it should be noted that the c_Next link field in the chunk creates an issue no matter what we do. A 100% lockless algorithm is used. atomic_cmpset_ptr() is used to manage the z_RChunks singly-linked list. * Remove the page localization code for now. For the life of the typically chunk of memory I don't think this provided much of an advantage. Prodded-by: Venkatesh Srinivas show more ...