pmap.h - OpenGrok history log for /dragonfly/sys/platform/vkernel64/include/pmap.h

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: v6.2.1, v6.2.0, v6.3.0, v6.0.1
# 8078b160	03-Jun-2021	Aaron LI <aly@aaronly.me>	pmap: Eliminate a simple macro 'pte_load_clear()' First, this macro is not used in vkernel64's pmap code. Secondly, this macro is sudden and looks unrelated to other things in the pmap.h header. S pmap: Eliminate a simple macro 'pte_load_clear()' First, this macro is not used in vkernel64's pmap code. Secondly, this macro is sudden and looks unrelated to other things in the pmap.h header. So just substitute it in the pmap code and get rid of it. show more ...
# 7e0dbbc6	03-Jun-2021	Aaron LI <aly@aaronly.me>	vm/pmap.h: Move vtophys() and vtophys_pte() macros here The two macros are defined against with pmap_kextract(), which is also declared in this header file, so it's a better place to hold the two ma vm/pmap.h: Move vtophys() and vtophys_pte() macros here The two macros are defined against with pmap_kextract(), which is also declared in this header file, so it's a better place to hold the two macros. In addition, this adjustment avoids the duplicates in both pc64 and vkernel64. show more ...
# c713db65	24-May-2021	Aaron LI <aly@aaronly.me>	vm: Change 'kernel_pmap' global to pointer type Following the previous commits, this commit changes the 'kernel_pmap' to pointer type of 'struct pmap '. This makes it align better with 'kernel_map vm: Change 'kernel_pmap' global to pointer type Following the previous commits, this commit changes the 'kernel_pmap' to pointer type of 'struct pmap '. This makes it align better with 'kernel_map' and simplifies the code a bit. No functional changes. show more ...
Revision tags: v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3
# da82a65a	11-Nov-2019	zrj <rimvydas.jasinskas@gmail.com>	Add <sys/cpumask.h>. Collect and gather all scatter cpumask bits to correct headers. This cleans up the namespace and simplifies platform handling in asm macros. The cpumask_t together with its m Add <sys/cpumask.h>. Collect and gather all scatter cpumask bits to correct headers. This cleans up the namespace and simplifies platform handling in asm macros. The cpumask_t together with its macros is already non MI feature that is used in userland utilities, libraries, kernel scheduler and syscalls. It deserves sys/ header. Adjust syscalls.master and rerun sysent. While there, fix an issue in ports that set POSIX env, but has implementation of setting thread names through pthread_set_name_np(). show more ...
# b7d3e109	15-Sep-2019	Matthew Dillon <dillon@apollo.backplane.com>	vkernel - Add MD_PAGE_FREEABLE() dummy macro * Add a dummy macro for MD_PAGE_FREEABLE() so the vkernel builds. * Fix vkernel compile error due to st_blksize size change. Reported-by: swildner
Revision tags: v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0
# ae442b2e	17-May-2019	Matthew Dillon <dillon@apollo.backplane.com>	kernel - VM rework part 10 - Precursor work for terminal pv_entry removal * Effectively remove pmap_track_modified(). Turn it into an assertion. The normal pmap code should NEVER EVER be called w kernel - VM rework part 10 - Precursor work for terminal pv_entry removal * Effectively remove pmap_track_modified(). Turn it into an assertion. The normal pmap code should NEVER EVER be called with any range inside the clean map. This assertion, and the routine in its entirety, will be removed in a later commit. * The purpose of the original code was to prevent buffer cache kvm mappings from being misinterpreted as contributing to the underlying vm_page's modified state. Normal paging operation synchronizes the modified bit and then transfers responsibility to the buffer cache. We didn't want manipulation of the buffer cache to further affect the modified bit for the page. In modern times, the buffer cache does NOT use a kernel_object based mapping for anything and there should be no chance of any kernel related pmap_enter() (entering a managed page into the kernel_pmap) from messing with the space. show more ...
Revision tags: v5.4.3, v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1, v5.2.0, v5.3.0, v5.2.0rc, v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc
# 95270b7e	01-Feb-2017	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Many fixes for vkernel support, plus a few main kernel fixes REAL KERNEL * The big enchillada is that the main kernel's thread switch code has a small timing window where it clears t kernel - Many fixes for vkernel support, plus a few main kernel fixes REAL KERNEL * The big enchillada is that the main kernel's thread switch code has a small timing window where it clears the PM_ACTIVE bit for the cpu while switching between two threads. However, it ALSO checks and avoids loading the %cr3 if the two threads have the same pmap. This results in a situation where an invalidation on the pmap in another cpuc may not have visibility to the cpu doing the switch, and yet the cpu doing the switch also decides not to reload %cr3 and so does not invalidate the TLB either. The result is a stale TLB and bad things happen. For now just unconditionally load %cr3 until I can come up with code to handle the case. This bug is very difficult to reproduce on a normal system, it requires a multi-threaded program doing nasty things (munmap, etc) on one cpu while another thread is switching to a third thread on some other cpu. * KNOTE after handling the vkernel trap in postsig() instead of before. * Change the kernel's pmap_inval_smp() code to take a 64-bit npgs argument instead of a 32-bit npgs argument. This fixes situations that crop up when a process uses more than 16TB of address space. * Add an lfence to the pmap invalidation code that I think might be needed. * Handle some wrap/overflow cases in pmap_scan() related to the use of large address spaces. * Fix an unnecessary invltlb in pmap_clearbit() for unmanaged PTEs. * Test PG_RW after locking the pv_entry to handle potential races. * Add bio_crc to struct bio. This field is only used for debugging for now but may come in useful later. * Add some global debug variables in the pmap_inval_smp() and related paths. Refactor the npgs handling. * Load the tsc_target field after waiting for completion of the previous invalidation op instead of before. Also add a conservative mfence() in the invalidation path before loading the info fields. * Remove the global pmap_inval_bulk_count counter. * Adjust swtch.s to always reload the user process %cr3, with an explanation. FIXME LATER! * Add some test code to vm/swap_pager.c which double-checks that the page being paged out does not get corrupted during the operation. This code is #if 0'd. * We must hold an object lock around the swp_pager_meta_ctl() call in swp_pager_async_iodone(). I think. * Reorder when PG_SWAPINPROG is cleared. Finish the I/O before clearing the bit. * Change the vm_map_growstack() API to pass a vm_map in instead of curproc. * Use atomic ops for vm_object->generation counts, since objects can be locked shared. VKERNEL * Unconditionally save the FP state after returning from VMSPACE_CTL_RUN. This solves a severe FP corruption bug in the vkernel due to calls it makes into libc (which uses %xmm registers all over the place). This is not a complete fix. We need a formal userspace/kernelspace FP abstraction. Right now the vkernel doesn't have a kernelspace FP abstraction so if a kernel thread switches preemptively bad things happen. * The kernel tracks and locks pv_entry structures to interlock pte's. The vkernel never caught up, and does not really have a pv_entry or placemark mechanism. The vkernel's pmap really needs a complete re-port from the real-kernel pmap code. Until then, we use poor hacks. * Use the vm_page's spinlock to interlock pte changes. * Make sure that PG_WRITEABLE is set or cleared with the vm_page spinlock held. * Have pmap_clearbit() acquire the pmobj token for the pmap in the iteration. This appears to be necessary, currently, as most of the rest of the vkernel pmap code also uses the pmobj token. * Fix bugs in the vkernel's swapu32() and swapu64(). * Change pmap_page_lookup() and pmap_unwire_pgtable() to fully busy the page. Note however that a page table page is currently never soft-busied. Also other vkernel code that busies a page table page. * Fix some sillycode in a pmap->pm_ptphint test. * Don't inherit e.g. PG_M from the previous pte when overwriting it with a pte of a different physical address. * Change the vkernel's pmap_clear_modify() function to clear VTPE_RW (which also clears VPTE_M), and not just VPTE_M. Formally we want the vkernel to be notified when a page becomes modified and it won't be unless we also clear VPTE_RW and force a fault. <--- I may change this back after testing. * Wrap pmap_replacevm() with a critical section. * Scrap the old grow_stack() code. vm_fault() and vm_fault_page() handle it (vm_fault_page() just now got the ability). * Properly flag VM_FAULT_USERMODE. show more ...
# c50e690b	30-Jan-2017	Matthew Dillon <dillon@apollo.backplane.com>	vkernel - Change how VPTE_M works to fix seg-faults during paging * Properly set and clear PG_WRITEABLE * TAILQ_FOREACH() iterations on m->md.pv_list must be restarted if we ever drop the spin-lo vkernel - Change how VPTE_M works to fix seg-faults during paging * Properly set and clear PG_WRITEABLE * TAILQ_FOREACH() iterations on m->md.pv_list must be restarted if we ever drop the spin-lock. * Change VPAGETABLE semantics and operation, cleaning up some things and fixing others. Have the real-kernel only conditionally downgrade the real pte to read-only for a VPTE_RW vpte. It only downgrades it if VPTE_M is not set, improving performance. Fix the virtual kernel to properly invalidate the real-kernel pte's when clearing VPTE_M. This improves issues that crop up when the vkernel is paging heavily. * Replace the linear pv_plist with a RB tree. Also have pmap_remove_pages() simply call pmap_remove(). Note that pmap_remove_pages()'s old code was broken because it only scanned the pv_entry list and missed unmanaged pages. Fixing this also fixes a vmspace reuse issue where the real-host pmap still contained stale PTEs from prior use. show more ...
# c78d5661	26-Jan-2017	Matthew Dillon <dillon@apollo.backplane.com>	vkernel - Refactor pmap * Refactor the pmap code. Use vm_page locking to protect PTEs. * Change the accounting from using vm_page->hold_count to using vm_page->wire_count. * Replace unlocked pt vkernel - Refactor pmap * Refactor the pmap code. Use vm_page locking to protect PTEs. * Change the accounting from using vm_page->hold_count to using vm_page->wire_count. * Replace unlocked pt/pd/pdp lookups with explicit page tests for non-kernel pmaps. show more ...
# 76f1911e	23-Jan-2017	Matthew Dillon <dillon@apollo.backplane.com>	kernel - pmap and vkernel work * Remove the pmap.pm_token entirely. The pmap is currently protected primarily by fine-grained locks and the vm_map lock. The intention is to eventually be able kernel - pmap and vkernel work * Remove the pmap.pm_token entirely. The pmap is currently protected primarily by fine-grained locks and the vm_map lock. The intention is to eventually be able to protect it without the vm_map lock at all. * Enhance pv_entry acquisition (representing PTE locations) to include a placemarker facility for non-existant PTEs, allowing the PTE location to be locked whether a pv_entry exists for it or not. * Fix dev_dmmap (struct dev_mmap) (for future use), it was returning a page index for physical memory as a 32-bit integer instead of a 64-bit integer. * Use pmap_kextract() instead of pmap_extract() where appropriate. * Put the token contention test back in kern_clock.c for real kernels so token contention shows up as sys% instead of idle%. * Modify the pmap_extract() API to also return a locked pv_entry, and add pmap_extract_done() to release it. Adjust users of pmap_extract(). * Change madvise/mcontrol MADV_INVAL (used primarily by the vkernel) to use a shared vm_map lock instead of an exclusive lock. This significantly improves the vkernel's performance and significantly reduces stalls and glitches when typing in one under heavy loads. * The new placemarkers also have the side effect of fixing several difficult-to-reproduce bugs in the pmap code, by ensuring that shared and unmanaged pages are properly locked whereas before only managed pages (with pv_entry's) were properly locked. * Adjust the vkernel's pmap code to use atomic ops in numerous places. * Rename the pmap_change_wiring() call to pmap_unwire(). The routine was only being used to unwire (and could only safely be called for unwiring anyway). Remove the unused 'wired' and the 'entry' arguments. Also change how pmap_unwire() works to remove a small race condition. * Fix race conditions in the vmspace_() system calls which could lead to pmap corruption. Note that the vkernel did not trigger any of these conditions, I found them while looking for another bug. Add missing maptypes to procfs's /proc/*/map report. show more ...
# 534ee349	28-Dec-2016	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Implement RLIMIT_RSS, Increase maximum supported swap * Implement RLIMIT_RSS by forcing pages out to swap if a process's RSS exceeds the rlimit. Currently the algorith used to choose the kernel - Implement RLIMIT_RSS, Increase maximum supported swap * Implement RLIMIT_RSS by forcing pages out to swap if a process's RSS exceeds the rlimit. Currently the algorith used to choose the pages is fairly unsophisticated (we don't have the luxury of a per-process vm_page_queues[] array). * Implement the swap_user_async sysctl, default off. This sysctl can be set to 1 to enable asynchronous paging in the RSS code. This is mostly for testing and is not recommended since it allows the process to eat memory more quickly than it can be paged out. * Reimplement vm.swap_burst_read so the sysctl now specifies the number of pages that are allowed to be burst. Still disabled by default (will be enabled in a followup commit). * Fix an overflow in the nswap_lowat and nswap_hiwat calculations. * Refactor some of the pageout code to support synchronous direct paging, which the RSS code uses. Thew new code also implements a feature that will move clean pages to PQ_CACHE, making them immediately reallocatable. * Refactor the vm_pageout_deficit variable, using atomic ops. * Fix an issue in vm_pageout_clean() (originally part of the inactive scan) which prevented clustering from operating properly on write. * Refactor kern/subr_blist.c and all associated code that uses to increase swblk_t from int32_t to int64_t, and to increase the radix supported from 31 bits to 63 bits. This increases the maximum supported swap from 2TB to some ungodly large value. Remember that, by default, space for up to 4 swap devices is preallocated so if you are allocating insane amounts of swap it is best to do it with four equal-sized partitions instead of one so kernel memory is efficiently allocated. * There are two kernel data structures associated with swap. The blmeta structure which has approximately a 1:8192 ratio (ram:swap) and is pre-allocated up-front, and the swmeta structure whos KVA is reserved but not allocated. The swmeta structure has a 1:341 ratio. It tracks swap assignments for pages in vm_object's. The kernel limits the number of structures to approximately half of physical memory, meaning that if you have a machine with 16GB of ram the maximum amount of swapped-out data you can support with that is 16/2341 = 2.7TB. Not that you would actually want to eat half your ram to do actually do that. A large system with, say, 128GB of ram, would be able to support 128/2341 = 21TB of swap. The ultimate limitation is the 512GB of KVM. The swap system can use up to 256GB of this so the maximum swap currently supported by DragonFly on a machine with > 512GB of ram is going to be 256/2341 = 43TB. To expand this further would require some adjustments to increase the amount of KVM supported by the kernel. WARNING! swmeta is allocated via zalloc(). Once allocated, the memory can be reused for swmeta but cannot be freed for use by other subsystems. You should only configure as much swap as you are willing to reserve ram for. show more ...
Revision tags: v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0
# 2f0acc22	17-Jul-2016	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Improve physio performance * See http://apollo.backplane.com/DFlyMisc/nvme_sys03.txt * Hash the pbuf system. This chops down spin-lock collisions at high transaction rates (>150K IOPS) kernel - Improve physio performance * See http://apollo.backplane.com/DFlyMisc/nvme_sys03.txt * Hash the pbuf system. This chops down spin-lock collisions at high transaction rates (>150K IOPS) by 1000x. * Implement a pbuf with pre-allocated kernel memory that we copy into, avoiding page table manipulations and thus avoiding system-wide invltlb/invlpg IPIs. * This increases NVMe IOPS tests with three cards from 150K-200K IOPS to 950K IOPS using physio (random read, 4K blocks, from urandom-filled partition, with many process threads, from 3 NVMe cards in parallel). * Further adjustments to the vkernel build. show more ...
Revision tags: v4.4.3, v4.4.2
# 2c64e990	25-Jan-2016	zrj <rimvydas.jasinskas@gmail.com>	Remove advertising header from sys/ Correct BSD License clause numbering from 1-2-4 to 1-2-3. Some less clear cases taken as it was done of FreeBSD.
Revision tags: v4.4.1, v4.4.0, v4.5.0, v4.4.0rc, v4.2.4, v4.3.1, v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc, v4.0.5, v4.0.4, v4.0.3, v4.0.2, v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0, v3.8.2
# cc694a4a	30-Jun-2014	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Move CPUMASK_LOCK out of the cpumask_t * Add cpulock_t (a 32-bit integer on all platforms) and implement CPULOCK_EXCL as well as space for a counter. * Break-out CPUMASK_LOCK, add a new kernel - Move CPUMASK_LOCK out of the cpumask_t * Add cpulock_t (a 32-bit integer on all platforms) and implement CPULOCK_EXCL as well as space for a counter. * Break-out CPUMASK_LOCK, add a new field to the pmap (pm_active_lock) and do the process vmm (p_vmm_cpulock) and implement the mmu interlock there. The VMM subsystem uses additional bits in cpulock_t as a mask counter for implementing its interlock. The PMAP subsystem just uses the CPULOCK_EXCL bit in pm_active_lock for its own interlock. * Max cpus on 64-bit systems is now 64 instead of 63. * cpumask_t is now just a pure cpu mask and no longer requires all-or-none atomic ops, just normal bit-for-bit atomic ops. This will allow us to hopefully extend it past the 64-cpu limit soon. show more ...
Revision tags: v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc, v3.6.2
# 381fa6da	02-Mar-2014	Sascha Wildner <saw@online.de>	Some fixes to allow building with gcc44. Most of them for type redefinitions which gcc47 has stopped warning about (if they are compatible). The libstdc++ fix is modeled after gcc47's libstdc++. We Some fixes to allow building with gcc44. Most of them for type redefinitions which gcc47 has stopped warning about (if they are compatible). The libstdc++ fix is modeled after gcc47's libstdc++. We don't have __libc_C_ctype_[] anymore. show more ...
Revision tags: v3.6.1, v3.6.0, v3.7.1, v3.6.0rc, v3.7.0
# a86ce0cd	20-Sep-2013	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - Merge Mihai Carabas's VKERNEL/VMM GSOC project into the main tree * This merge contains work primarily by Mihai Carabas, with some misc fixes also by Matthew Dillon. * Special note on G hammer2 - Merge Mihai Carabas's VKERNEL/VMM GSOC project into the main tree * This merge contains work primarily by Mihai Carabas, with some misc fixes also by Matthew Dillon. * Special note on GSOC core This is, needless to say, a huge amount of work compressed down into a few paragraphs of comments. Adds the pc64/vmm subdirectory and tons of stuff to support hardware virtualization in guest-user mode, plus the ability for programs (vkernels) running in this mode to make normal system calls to the host. * Add system call infrastructure for VMM mode operations in kern/sys_vmm.c which vectors through a structure to machine-specific implementations. vmm_guest_ctl_args() vmm_guest_sync_addr_args() vmm_guest_ctl_args() - bootstrap VMM and EPT modes. Copydown the original user stack for EPT (since EPT 'physical' addresses cannot reach that far into the backing store represented by the process's original VM space). Also installs the GUEST_CR3 for the guest using parameters supplied by the guest. vmm_guest_sync_addr_args() - A host helper function that the vkernel can use to invalidate page tables on multiple real cpus. This is a lot more efficient than having the vkernel try to do it itself with IPI signals via cpusync(). Add Intel VMX support to the host infrastructure. Again, tons of work compressed down into a one paragraph commit message. Intel VMX support added. AMD SVM support is not part of this GSOC and not yet supported by DragonFly. * Remove PG_* defines for PTE's and related mmu operations. Replace with a table lookup so the same pmap code can be used for normal page tables and also EPT tables. * Also include X86_PG_V defines specific to normal page tables for a few situations outside the pmap code. * Adjust DDB to disassemble SVM related (intel) instructions. * Add infrastructure to exit1() to deal related structures. * Optimize pfind() and pfindn() to remove the global token when looking up the current process's PID (Matt) * Add support for EPT (double layer page tables). This primarily required adjusting the pmap code to use a table lookup to get the PG_* bits. Add an indirect vector for copyin, copyout, and other user address space copy operations to support manual walks when EPT is in use. A multitude of system calls which manually looked up user addresses via the vm_map now need a VMM layer call to translate EPT. * Remove the MP lock from trapsignal() use cases in trap(). * (Matt) Add pthread_yield()s in most spin loops to help situations where the vkernel is running on more cpu's than the host has, and to help with scheduler edge cases on the host. * (Matt) Add a pmap_fault_page_quick() infrastructure that vm_fault_page() uses to try to shortcut operations and avoid locks. Implement it for pc64. This function checks whether the page is already faulted in as requested by looking up the PTE. If not it returns NULL and the full blown vm_fault_page() code continues running. * (Matt) Remove the MP lock from most the vkernel's trap() code * (Matt) Use a shared spinlock when possible for certain critical paths related to the copyin/copyout path. show more ...
Revision tags: v3.4.3
# c4d9bf46	09-Aug-2013	François Tigeot <ftigeot@wolfpond.org>	kernel: Add pmap_page_set_memattr() stubs for the vkernel platforms
Revision tags: v3.4.2, v3.4.0, v3.4.1, v3.4.0rc, v3.5.0, v3.2.2, v3.2.1, v3.2.0, v3.3.0
# 921c891e	13-Sep-2012	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Implement segment pmap optimizations for x86-64 * Implement 2MB segment optimizations for x86-64. Any shared read-only or read-write VM object mapped into memory, including physical obje kernel - Implement segment pmap optimizations for x86-64 * Implement 2MB segment optimizations for x86-64. Any shared read-only or read-write VM object mapped into memory, including physical objects (so both sysv_shm and mmap), which is a multiple of the segment size and segment-aligned can be optimized. * Enable with sysctl machdep.pmap_mmu_optimize=1 Default is off for now. This is an experimental feature. * It works as follows: A VM object which is large enough will, when VM faults are generated, store a truncated pmap (PD, PT, and PTEs) in the VM object itself. VM faults whos vm_map_entry's can be optimized will cause the PTE, PT, and also the PD (for now) to be stored in a pmap embedded in the VM_OBJECT, instead of in the process pmap. The process pmap then creates PT entry in the PD page table that points to the PT page table page stored in the VM_OBJECT's pmap. * This removes nearly all page table overhead from fork()'d processes or even unrelated process which massively share data via mmap() or sysv_shm. We still recommend using sysctl kern.ipc.shm_use_phys=1 (which is now the default), which also removes the PV entries associated with the shared pmap. However, with this optimization PV entries are no longer a big issue since they will not be replicated in each process, only in the common pmap stored in the VM_OBJECT. * Features of this optimization: * Number of PV entries is reduced to approximately the number of live pages and no longer multiplied by the number of processes separately mapping the shared memory. * One process faulting in a page naturally makes the PTE available to all other processes mapping the same shared memory. The other processes do not have to fault that same page in. * Page tables survive process exit and restart. * Once page tables are populated and cached, any new process that maps the shared memory will take far fewer faults because each fault will bring in an ENTIRE page table. Postgres w/ 64-clients, VM fault rate was observed to drop from 1M faults/sec to less than 500 at startup, and during the run the fault rates dropped from a steady decline into the hundreds of thousands into an instant decline to virtually zero VM faults. * We no longer have to depend on sysv_shm to optimize the MMU. * CPU caches will do a better job caching page tables since most of them are now themselves shared. Even when we invltlb, more of the page tables will be in the L1, L2, and L3 caches. * EXPERIMENTAL!!!!! show more ...
Revision tags: v3.0.3, v3.0.2, v3.0.1, v3.1.0, v3.0.0
# 86d7f5d3	26-Nov-2011	John Marino <draco@marino.st>	Initial import of binutils 2.22 on the new vendor branch Future versions of binutils will also reside on this branch rather than continuing to create new binutils branches for each new version.
# b12defdc	18-Oct-2011	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Major SMP performance patch / VM system, bus-fault/seg-fault fixes This is a very large patch which reworks locking in the entire VM subsystem, concentrated on VM objects and the x86-64 pma kernel - Major SMP performance patch / VM system, bus-fault/seg-fault fixes This is a very large patch which reworks locking in the entire VM subsystem, concentrated on VM objects and the x86-64 pmap code. These fixes remove nearly all the spin lock contention for non-threaded VM faults and narrows contention for threaded VM faults to just the threads sharing the pmap. Multi-socket many-core machines will see a 30-50% improvement in parallel build performance (tested on a 48-core opteron), depending on how well the build parallelizes. As part of this work a long-standing problem on 64-bit systems where programs would occasionally seg-fault or bus-fault for no reason has been fixed. The problem was related to races between vm_fault, the vm_object collapse code, and the vm_map splitting code. * Most uses of vm_token have been removed. All uses of vm_spin have been removed. These have been replaced with per-object tokens and per-queue (vm_page_queues[]) spin locks. Note in particular that since we still have the page coloring code the PQ_FREE and PQ_CACHE queues are actually many queues, individually spin-locked, resulting in very excellent MP page allocation and freeing performance. * Reworked vm_page_lookup() and vm_object->rb_memq. All (object,pindex) lookup operations are now covered by the vm_object hold/drop system, which utilize pool tokens on vm_objects. Calls now require that the VM object be held in order to ensure a stable outcome. Also added vm_page_lookup_busy_wait(), vm_page_lookup_busy_try(), vm_page_busy_wait(), vm_page_busy_try(), and other API functions which integrate the PG_BUSY handling. * Added OBJ_CHAINLOCK. Most vm_object operations are protected by the vm_object_hold/drop() facility which is token-based. Certain critical functions which must traverse backing_object chains use a hard-locking flag and lock almost the entire chain as it is traversed to prevent races against object deallocation, collapses, and splits. The last object in the chain (typically a vnode) is NOT locked in this manner, so concurrent faults which terminate at the same vnode will still have good performance. This is important e.g. for parallel compiles which might be running dozens of the same compiler binary concurrently. * Created a per vm_map token and removed most uses of vmspace_token. * Removed the mp_lock in sys_execve(). It has not been needed in a while. * Add kmem_lim_size() which returns approximate available memory (reduced by available KVM), in megabytes. This is now used to scale up the slab allocator cache and the pipe buffer caches to reduce unnecessary global kmem operations. * Rewrote vm_page_alloc(), various bits in vm/vm_contig.c, the swapcache scan code, and the pageout scan code. These routines were rewritten to use the per-queue spin locks. * Replaced the exponential backoff in the spinlock code with something a bit less complex and cleaned it up. * Restructured the IPIQ func/arg1/arg2 array for better cache locality. Removed the per-queue ip_npoll and replaced it with a per-cpu gd_npoll, which is used by other cores to determine if they need to issue an actual hardware IPI or not. This reduces hardware IPI issuance considerably (and the removal of the decontention code reduced it even more). * Temporarily removed the lwkt thread fairq code and disabled a number of features. These will be worked back in once we track down some of the remaining performance issues. Temproarily removed the lwkt thread resequencer for tokens for the same reason. This might wind up being permanent. Added splz_check()s in a few critical places. * Increased the number of pool tokens from 1024 to 4001 and went to a prime-number mod algorithm to reduce overlaps. * Removed the token decontention code. This was a bit of an eyesore and while it did its job when we had global locks it just gets in the way now that most of the global locks are gone. Replaced the decontention code with a fall back which acquires the tokens in sorted order, to guarantee that deadlocks will always be resolved eventually in the scheduler. * Introduced a simplified spin-for-a-little-while function _lwkt_trytoken_spin() that the token code now uses rather than giving up immediately. * The vfs_bio subsystem no longer uses vm_token and now uses the vm_object_hold/drop API for buffer cache operations, resulting in very good concurrency. * Gave the vnode its own spinlock instead of sharing vp->v_lock.lk_spinlock, which fixes a deadlock. * Adjusted all platform pamp.c's to handle the new main kernel APIs. The i386 pmap.c is still a bit out of date but should be compatible. * Completely rewrote very large chunks of the x86-64 pmap.c code. The critical path no longer needs pmap_spin but pmap_spin itself is still used heavily, particularin the pv_entry handling code. A per-pmap token and per-pmap object are now used to serialize pmamp access and vm_page lookup operations when needed. The x86-64 pmap.c code now uses only vm_page->crit_count instead of both crit_count and hold_count, which fixes races against other parts of the kernel uses vm_page_hold(). _pmap_allocpte() mechanics have been completely rewritten to remove potential races. Much of pmap_enter() and pmap_enter_quick() has also been rewritten. Many other changes. * The following subsystems (and probably more) no longer use the vm_token or vmobj_token in critical paths: x The swap_pager now uses the vm_object_hold/drop API instead of vm_token. x mmap() and vm_map/vm_mmap in general now use the vm_object_hold/drop API instead of vm_token. x vnode_pager x zalloc x vm_page handling x vfs_bio x umtx system calls x vm_fault and friends * Minor fixes to fill_kinfo_proc() to deal with process scan panics (ps) revealed by recent global lock removals. * lockmgr() locks no longer support LK_NOSPINWAIT. Spin locks are unconditionally acquired. * Replaced netif/e1000's spinlocks with lockmgr locks. The spinlocks were not appropriate owing to the large context they were covering. * Misc atomic ops added show more ...
Revision tags: v2.12.0, v2.13.0, v2.10.1, v2.11.0, v2.10.0
# da23a592	09-Dec-2010	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Add support for up to 63 cpus & 512G of ram for 64-bit builds. * Increase SMP_MAXCPU to 63 for 64-bit builds. * cpumask_t is 64 bits on 64-bit builds now. It remains 32 bits on 32-bit b kernel - Add support for up to 63 cpus & 512G of ram for 64-bit builds. * Increase SMP_MAXCPU to 63 for 64-bit builds. * cpumask_t is 64 bits on 64-bit builds now. It remains 32 bits on 32-bit builds. * Add #define's for atomic_set_cpumask(), atomic_clear_cpumask, and atomic_cmpset_cpumask(). Replace all use cases on cpu masks with these functions. * Add CPUMASK(), BSRCPUMASK(), and BSFCPUMASK() macros. Replace all use cases on cpu masks with these functions. In particular note that (1 << cpu) just doesn't work with a 64-bit cpumask. Numerous bits of assembly also had to be adjusted to use e.g. btq instead of btl, etc. * Change __uint32_t declarations that were meant to be cpu masks to use cpumask_t (most already have). Also change other bits of code which work on cpu masks to be more agnostic. For example, poll_cpumask0 and lwp_cpumask. * 64-bit atomic ops cannot use "iq", they must use "r", because most x86-64 do NOT have 64-bit immediate value support. * Rearrange initial kernel memory allocations to start from KvaStart and not KERNBASE, because only 2GB of KVM is available after KERNBASE. Certain VM allocations with > 32G of ram can exceed 2GB. For example, vm_page_array[]. 2GB was not enough. * Remove numerous mdglobaldata fields that are not used. * Align CPU_prvspace[] for now. Eventually it will be moved into a mapped area. Reserve sufficient space at MPPTDI now, but it is still unused. * When pre-allocating kernel page table PD entries calculate the number of page table pages at KvaStart and at KERNBASE separately, since the KVA space starting at KERNBASE caps out at 2GB. * Change kmem_init() and vm_page_startup() to not take memory range arguments. Instead the globals (virtual_start and virtual_end) are manipualted directly. show more ...
Revision tags: v2.9.1, v2.8.2, v2.8.1, v2.8.0, v2.9.0
# ad54aa11	15-Sep-2010	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Increase x86_64 & vkernel kvm, adjust vm_page_array mapping * Change the vm_page_array and dmesg space to not use the DMAP area. The space could not be accessed by userland kvm utilities kernel - Increase x86_64 & vkernel kvm, adjust vm_page_array mapping * Change the vm_page_array and dmesg space to not use the DMAP area. The space could not be accessed by userland kvm utilities due to that issue. TODO - reoptimize to use 2M super-pages. * Auto-size NKPT to accomodate the above changes as vm_page_array[] is now mapped into the kernel page tables. * Increase NKPDPE to 128 PDPs to accomodate machines with large amounts of ram. This increases the kernel KVA space to 128G. show more ...
Revision tags: v2.6.3, v2.7.3, v2.6.2, v2.7.2, v2.7.1, v2.6.1, v2.7.0, v2.6.0, v2.5.1, v2.4.1, v2.5.0, v2.4.0
# da673940	17-Aug-2009	Jordan Gordeev <jgordeev@dir.bg>	Add platform vkernel64.