History log of /dragonfly/sys/platform/vkernel64/include/pmap.h (Results 1 – 23 of 23)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.2.1, v6.2.0, v6.3.0, v6.0.1
# 8078b160 03-Jun-2021 Aaron LI <aly@aaronly.me>

pmap: Eliminate a simple macro 'pte_load_clear()'

First, this macro is not used in vkernel64's pmap code. Secondly, this
macro is sudden and looks unrelated to other things in the pmap.h
header. S

pmap: Eliminate a simple macro 'pte_load_clear()'

First, this macro is not used in vkernel64's pmap code. Secondly, this
macro is sudden and looks unrelated to other things in the pmap.h
header. So just substitute it in the pmap code and get rid of it.

show more ...


# 7e0dbbc6 03-Jun-2021 Aaron LI <aly@aaronly.me>

vm/pmap.h: Move vtophys() and vtophys_pte() macros here

The two macros are defined against with pmap_kextract(), which is also
declared in this header file, so it's a better place to hold the two
ma

vm/pmap.h: Move vtophys() and vtophys_pte() macros here

The two macros are defined against with pmap_kextract(), which is also
declared in this header file, so it's a better place to hold the two
macros.

In addition, this adjustment avoids the duplicates in both pc64 and
vkernel64.

show more ...


# c713db65 24-May-2021 Aaron LI <aly@aaronly.me>

vm: Change 'kernel_pmap' global to pointer type

Following the previous commits, this commit changes the 'kernel_pmap'
to pointer type of 'struct pmap *'. This makes it align better with
'kernel_map

vm: Change 'kernel_pmap' global to pointer type

Following the previous commits, this commit changes the 'kernel_pmap'
to pointer type of 'struct pmap *'. This makes it align better with
'kernel_map' and simplifies the code a bit.

No functional changes.

show more ...


Revision tags: v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3
# da82a65a 11-Nov-2019 zrj <rimvydas.jasinskas@gmail.com>

Add <sys/cpumask.h>.

Collect and gather all scatter cpumask bits to correct headers. This
cleans up the namespace and simplifies platform handling in asm macros.
The cpumask_t together with its m

Add <sys/cpumask.h>.

Collect and gather all scatter cpumask bits to correct headers. This
cleans up the namespace and simplifies platform handling in asm macros.
The cpumask_t together with its macros is already non MI feature that is
used in userland utilities, libraries, kernel scheduler and syscalls.
It deserves sys/ header. Adjust syscalls.master and rerun sysent.

While there, fix an issue in ports that set POSIX env, but has
implementation of setting thread names through pthread_set_name_np().

show more ...


# b7d3e109 15-Sep-2019 Matthew Dillon <dillon@apollo.backplane.com>

vkernel - Add MD_PAGE_FREEABLE() dummy macro

* Add a dummy macro for MD_PAGE_FREEABLE() so the vkernel builds.

* Fix vkernel compile error due to st_blksize size change.

Reported-by: swildner


Revision tags: v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0
# ae442b2e 17-May-2019 Matthew Dillon <dillon@apollo.backplane.com>

kernel - VM rework part 10 - Precursor work for terminal pv_entry removal

* Effectively remove pmap_track_modified(). Turn it into an assertion.
The normal pmap code should NEVER EVER be called w

kernel - VM rework part 10 - Precursor work for terminal pv_entry removal

* Effectively remove pmap_track_modified(). Turn it into an assertion.
The normal pmap code should NEVER EVER be called with any range inside
the clean map.

This assertion, and the routine in its entirety, will be removed in a
later commit.

* The purpose of the original code was to prevent buffer cache kvm mappings
from being misinterpreted as contributing to the underlying vm_page's
modified state. Normal paging operation synchronizes the modified bit and
then transfers responsibility to the buffer cache. We didn't want
manipulation of the buffer cache to further affect the modified bit for
the page.

In modern times, the buffer cache does NOT use a kernel_object based
mapping for anything and there should be no chance of any kernel related
pmap_enter() (entering a managed page into the kernel_pmap) from messing
with the space.

show more ...


Revision tags: v5.4.3, v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1, v5.2.0, v5.3.0, v5.2.0rc, v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc
# 95270b7e 01-Feb-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Many fixes for vkernel support, plus a few main kernel fixes

REAL KERNEL

* The big enchillada is that the main kernel's thread switch code has
a small timing window where it clears t

kernel - Many fixes for vkernel support, plus a few main kernel fixes

REAL KERNEL

* The big enchillada is that the main kernel's thread switch code has
a small timing window where it clears the PM_ACTIVE bit for the cpu
while switching between two threads. However, it *ALSO* checks and
avoids loading the %cr3 if the two threads have the same pmap.

This results in a situation where an invalidation on the pmap in another
cpuc may not have visibility to the cpu doing the switch, and yet the
cpu doing the switch also decides not to reload %cr3 and so does not
invalidate the TLB either. The result is a stale TLB and bad things
happen.

For now just unconditionally load %cr3 until I can come up with code
to handle the case.

This bug is very difficult to reproduce on a normal system, it requires
a multi-threaded program doing nasty things (munmap, etc) on one cpu
while another thread is switching to a third thread on some other cpu.

* KNOTE after handling the vkernel trap in postsig() instead of before.

* Change the kernel's pmap_inval_smp() code to take a 64-bit npgs
argument instead of a 32-bit npgs argument. This fixes situations
that crop up when a process uses more than 16TB of address space.

* Add an lfence to the pmap invalidation code that I think might be
needed.

* Handle some wrap/overflow cases in pmap_scan() related to the use of
large address spaces.

* Fix an unnecessary invltlb in pmap_clearbit() for unmanaged PTEs.

* Test PG_RW after locking the pv_entry to handle potential races.

* Add bio_crc to struct bio. This field is only used for debugging for
now but may come in useful later.

* Add some global debug variables in the pmap_inval_smp() and related
paths. Refactor the npgs handling.

* Load the tsc_target field after waiting for completion of the previous
invalidation op instead of before. Also add a conservative mfence()
in the invalidation path before loading the info fields.

* Remove the global pmap_inval_bulk_count counter.

* Adjust swtch.s to always reload the user process %cr3, with an
explanation. FIXME LATER!

* Add some test code to vm/swap_pager.c which double-checks that the page
being paged out does not get corrupted during the operation. This code
is #if 0'd.

* We must hold an object lock around the swp_pager_meta_ctl() call in
swp_pager_async_iodone(). I think.

* Reorder when PG_SWAPINPROG is cleared. Finish the I/O before clearing
the bit.

* Change the vm_map_growstack() API to pass a vm_map in instead of
curproc.

* Use atomic ops for vm_object->generation counts, since objects can be
locked shared.

VKERNEL

* Unconditionally save the FP state after returning from VMSPACE_CTL_RUN.
This solves a severe FP corruption bug in the vkernel due to calls it
makes into libc (which uses %xmm registers all over the place).

This is not a complete fix. We need a formal userspace/kernelspace FP
abstraction. Right now the vkernel doesn't have a kernelspace FP
abstraction so if a kernel thread switches preemptively bad things
happen.

* The kernel tracks and locks pv_entry structures to interlock pte's.
The vkernel never caught up, and does not really have a pv_entry or
placemark mechanism. The vkernel's pmap really needs a complete
re-port from the real-kernel pmap code. Until then, we use poor hacks.

* Use the vm_page's spinlock to interlock pte changes.

* Make sure that PG_WRITEABLE is set or cleared with the vm_page
spinlock held.

* Have pmap_clearbit() acquire the pmobj token for the pmap in the
iteration. This appears to be necessary, currently, as most of the
rest of the vkernel pmap code also uses the pmobj token.

* Fix bugs in the vkernel's swapu32() and swapu64().

* Change pmap_page_lookup() and pmap_unwire_pgtable() to fully busy
the page. Note however that a page table page is currently never
soft-busied. Also other vkernel code that busies a page table page.

* Fix some sillycode in a pmap->pm_ptphint test.

* Don't inherit e.g. PG_M from the previous pte when overwriting it
with a pte of a different physical address.

* Change the vkernel's pmap_clear_modify() function to clear VTPE_RW
(which also clears VPTE_M), and not just VPTE_M. Formally we want
the vkernel to be notified when a page becomes modified and it won't
be unless we also clear VPTE_RW and force a fault. <--- I may change
this back after testing.

* Wrap pmap_replacevm() with a critical section.

* Scrap the old grow_stack() code. vm_fault() and vm_fault_page() handle
it (vm_fault_page() just now got the ability).

* Properly flag VM_FAULT_USERMODE.

show more ...


# c50e690b 30-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

vkernel - Change how VPTE_M works to fix seg-faults during paging

* Properly set and clear PG_WRITEABLE

* TAILQ_FOREACH() iterations on m->md.pv_list must be restarted
if we ever drop the spin-lo

vkernel - Change how VPTE_M works to fix seg-faults during paging

* Properly set and clear PG_WRITEABLE

* TAILQ_FOREACH() iterations on m->md.pv_list must be restarted
if we ever drop the spin-lock.

* Change VPAGETABLE semantics and operation, cleaning up some things
and fixing others.

Have the real-kernel only conditionally downgrade the real pte to
read-only for a VPTE_RW vpte. It only downgrades it if VPTE_M is
not set, improving performance.

Fix the virtual kernel to properly invalidate the real-kernel pte's
when clearing VPTE_M. This improves issues that crop up when the
vkernel is paging heavily.

* Replace the linear pv_plist with a RB tree. Also have pmap_remove_pages()
simply call pmap_remove().

Note that pmap_remove_pages()'s old code was broken because it only
scanned the pv_entry list and missed unmanaged pages. Fixing this
also fixes a vmspace reuse issue where the real-host pmap still
contained stale PTEs from prior use.

show more ...


# c78d5661 26-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

vkernel - Refactor pmap

* Refactor the pmap code. Use vm_page locking to protect PTEs.

* Change the accounting from using vm_page->hold_count to using
vm_page->wire_count.

* Replace unlocked pt

vkernel - Refactor pmap

* Refactor the pmap code. Use vm_page locking to protect PTEs.

* Change the accounting from using vm_page->hold_count to using
vm_page->wire_count.

* Replace unlocked pt/pd/pdp lookups with explicit page tests for non-kernel
pmaps.

show more ...


# 76f1911e 23-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - pmap and vkernel work

* Remove the pmap.pm_token entirely. The pmap is currently protected
primarily by fine-grained locks and the vm_map lock. The intention
is to eventually be able

kernel - pmap and vkernel work

* Remove the pmap.pm_token entirely. The pmap is currently protected
primarily by fine-grained locks and the vm_map lock. The intention
is to eventually be able to protect it without the vm_map lock at all.

* Enhance pv_entry acquisition (representing PTE locations) to include
a placemarker facility for non-existant PTEs, allowing the PTE location
to be locked whether a pv_entry exists for it or not.

* Fix dev_dmmap (struct dev_mmap) (for future use), it was returning a
page index for physical memory as a 32-bit integer instead of a 64-bit
integer.

* Use pmap_kextract() instead of pmap_extract() where appropriate.

* Put the token contention test back in kern_clock.c for real kernels
so token contention shows up as sys% instead of idle%.

* Modify the pmap_extract() API to also return a locked pv_entry,
and add pmap_extract_done() to release it. Adjust users of
pmap_extract().

* Change madvise/mcontrol MADV_INVAL (used primarily by the vkernel)
to use a shared vm_map lock instead of an exclusive lock. This
significantly improves the vkernel's performance and significantly
reduces stalls and glitches when typing in one under heavy loads.

* The new placemarkers also have the side effect of fixing several
difficult-to-reproduce bugs in the pmap code, by ensuring that
shared and unmanaged pages are properly locked whereas before only
managed pages (with pv_entry's) were properly locked.

* Adjust the vkernel's pmap code to use atomic ops in numerous places.

* Rename the pmap_change_wiring() call to pmap_unwire(). The routine
was only being used to unwire (and could only safely be called for
unwiring anyway). Remove the unused 'wired' and the 'entry'
arguments.

Also change how pmap_unwire() works to remove a small race condition.

* Fix race conditions in the vmspace_*() system calls which could lead
to pmap corruption. Note that the vkernel did not trigger any of
these conditions, I found them while looking for another bug.

* Add missing maptypes to procfs's /proc/*/map report.

show more ...


# 534ee349 28-Dec-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Implement RLIMIT_RSS, Increase maximum supported swap

* Implement RLIMIT_RSS by forcing pages out to swap if a process's RSS
exceeds the rlimit. Currently the algorith used to choose the

kernel - Implement RLIMIT_RSS, Increase maximum supported swap

* Implement RLIMIT_RSS by forcing pages out to swap if a process's RSS
exceeds the rlimit. Currently the algorith used to choose the pages
is fairly unsophisticated (we don't have the luxury of a per-process
vm_page_queues[] array).

* Implement the swap_user_async sysctl, default off. This sysctl can be
set to 1 to enable asynchronous paging in the RSS code. This is mostly
for testing and is not recommended since it allows the process to eat
memory more quickly than it can be paged out.

* Reimplement vm.swap_burst_read so the sysctl now specifies the number
of pages that are allowed to be burst. Still disabled by default (will
be enabled in a followup commit).

* Fix an overflow in the nswap_lowat and nswap_hiwat calculations.

* Refactor some of the pageout code to support synchronous direct
paging, which the RSS code uses. Thew new code also implements a
feature that will move clean pages to PQ_CACHE, making them immediately
reallocatable.

* Refactor the vm_pageout_deficit variable, using atomic ops.

* Fix an issue in vm_pageout_clean() (originally part of the inactive scan)
which prevented clustering from operating properly on write.

* Refactor kern/subr_blist.c and all associated code that uses to increase
swblk_t from int32_t to int64_t, and to increase the radix supported from
31 bits to 63 bits.

This increases the maximum supported swap from 2TB to some ungodly large
value. Remember that, by default, space for up to 4 swap devices
is preallocated so if you are allocating insane amounts of swap it is
best to do it with four equal-sized partitions instead of one so kernel
memory is efficiently allocated.

* There are two kernel data structures associated with swap. The blmeta
structure which has approximately a 1:8192 ratio (ram:swap) and is
pre-allocated up-front, and the swmeta structure whos KVA is reserved
but not allocated.

The swmeta structure has a 1:341 ratio. It tracks swap assignments for
pages in vm_object's. The kernel limits the number of structures to
approximately half of physical memory, meaning that if you have a machine
with 16GB of ram the maximum amount of swapped-out data you can support
with that is 16/2*341 = 2.7TB. Not that you would actually want to eat
half your ram to do actually do that.

A large system with, say, 128GB of ram, would be able to support
128/2*341 = 21TB of swap. The ultimate limitation is the 512GB of KVM.
The swap system can use up to 256GB of this so the maximum swap currently
supported by DragonFly on a machine with > 512GB of ram is going to be
256/2*341 = 43TB. To expand this further would require some adjustments
to increase the amount of KVM supported by the kernel.

* WARNING! swmeta is allocated via zalloc(). Once allocated, the memory
can be reused for swmeta but cannot be freed for use by other subsystems.
You should only configure as much swap as you are willing to reserve ram
for.

show more ...


Revision tags: v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0
# 2f0acc22 17-Jul-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Improve physio performance

* See http://apollo.backplane.com/DFlyMisc/nvme_sys03.txt

* Hash the pbuf system. This chops down spin-lock collisions
at high transaction rates (>150K IOPS)

kernel - Improve physio performance

* See http://apollo.backplane.com/DFlyMisc/nvme_sys03.txt

* Hash the pbuf system. This chops down spin-lock collisions
at high transaction rates (>150K IOPS) by 1000x.

* Implement a pbuf with pre-allocated kernel memory that we
copy into, avoiding page table manipulations and thus
avoiding system-wide invltlb/invlpg IPIs.

* This increases NVMe IOPS tests with three cards from
150K-200K IOPS to 950K IOPS using physio (random read,
4K blocks, from urandom-filled partition, with many
process threads, from 3 NVMe cards in parallel).

* Further adjustments to the vkernel build.

show more ...


Revision tags: v4.4.3, v4.4.2
# 2c64e990 25-Jan-2016 zrj <rimvydas.jasinskas@gmail.com>

Remove advertising header from sys/

Correct BSD License clause numbering from 1-2-4 to 1-2-3.

Some less clear cases taken as it was done of FreeBSD.


Revision tags: v4.4.1, v4.4.0, v4.5.0, v4.4.0rc, v4.2.4, v4.3.1, v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc, v4.0.5, v4.0.4, v4.0.3, v4.0.2, v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0, v3.8.2
# cc694a4a 30-Jun-2014 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Move CPUMASK_LOCK out of the cpumask_t

* Add cpulock_t (a 32-bit integer on all platforms) and implement
CPULOCK_EXCL as well as space for a counter.

* Break-out CPUMASK_LOCK, add a new

kernel - Move CPUMASK_LOCK out of the cpumask_t

* Add cpulock_t (a 32-bit integer on all platforms) and implement
CPULOCK_EXCL as well as space for a counter.

* Break-out CPUMASK_LOCK, add a new field to the pmap (pm_active_lock)
and do the process vmm (p_vmm_cpulock) and implement the mmu interlock
there.

The VMM subsystem uses additional bits in cpulock_t as a mask counter
for implementing its interlock.

The PMAP subsystem just uses the CPULOCK_EXCL bit in pm_active_lock for
its own interlock.

* Max cpus on 64-bit systems is now 64 instead of 63.

* cpumask_t is now just a pure cpu mask and no longer requires all-or-none
atomic ops, just normal bit-for-bit atomic ops. This will allow us to
hopefully extend it past the 64-cpu limit soon.

show more ...


Revision tags: v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc, v3.6.2
# 381fa6da 02-Mar-2014 Sascha Wildner <saw@online.de>

Some fixes to allow building with gcc44.

Most of them for type redefinitions which gcc47 has stopped warning about
(if they are compatible).

The libstdc++ fix is modeled after gcc47's libstdc++. We

Some fixes to allow building with gcc44.

Most of them for type redefinitions which gcc47 has stopped warning about
(if they are compatible).

The libstdc++ fix is modeled after gcc47's libstdc++. We don't have
__libc_C_ctype_[] anymore.

show more ...


Revision tags: v3.6.1, v3.6.0, v3.7.1, v3.6.0rc, v3.7.0
# a86ce0cd 20-Sep-2013 Matthew Dillon <dillon@apollo.backplane.com>

hammer2 - Merge Mihai Carabas's VKERNEL/VMM GSOC project into the main tree

* This merge contains work primarily by Mihai Carabas, with some misc
fixes also by Matthew Dillon.

* Special note on G

hammer2 - Merge Mihai Carabas's VKERNEL/VMM GSOC project into the main tree

* This merge contains work primarily by Mihai Carabas, with some misc
fixes also by Matthew Dillon.

* Special note on GSOC core

This is, needless to say, a huge amount of work compressed down into a
few paragraphs of comments. Adds the pc64/vmm subdirectory and tons
of stuff to support hardware virtualization in guest-user mode, plus
the ability for programs (vkernels) running in this mode to make normal
system calls to the host.

* Add system call infrastructure for VMM mode operations in kern/sys_vmm.c
which vectors through a structure to machine-specific implementations.

vmm_guest_ctl_args()
vmm_guest_sync_addr_args()

vmm_guest_ctl_args() - bootstrap VMM and EPT modes. Copydown the original
user stack for EPT (since EPT 'physical' addresses cannot reach that far
into the backing store represented by the process's original VM space).
Also installs the GUEST_CR3 for the guest using parameters supplied by
the guest.

vmm_guest_sync_addr_args() - A host helper function that the vkernel can
use to invalidate page tables on multiple real cpus. This is a lot more
efficient than having the vkernel try to do it itself with IPI signals
via cpusync*().

* Add Intel VMX support to the host infrastructure. Again, tons of work
compressed down into a one paragraph commit message. Intel VMX support
added. AMD SVM support is not part of this GSOC and not yet supported
by DragonFly.

* Remove PG_* defines for PTE's and related mmu operations. Replace with
a table lookup so the same pmap code can be used for normal page tables
and also EPT tables.

* Also include X86_PG_V defines specific to normal page tables for a few
situations outside the pmap code.

* Adjust DDB to disassemble SVM related (intel) instructions.

* Add infrastructure to exit1() to deal related structures.

* Optimize pfind() and pfindn() to remove the global token when looking
up the current process's PID (Matt)

* Add support for EPT (double layer page tables). This primarily required
adjusting the pmap code to use a table lookup to get the PG_* bits.

Add an indirect vector for copyin, copyout, and other user address space
copy operations to support manual walks when EPT is in use.

A multitude of system calls which manually looked up user addresses via
the vm_map now need a VMM layer call to translate EPT.

* Remove the MP lock from trapsignal() use cases in trap().

* (Matt) Add pthread_yield()s in most spin loops to help situations where
the vkernel is running on more cpu's than the host has, and to help with
scheduler edge cases on the host.

* (Matt) Add a pmap_fault_page_quick() infrastructure that vm_fault_page()
uses to try to shortcut operations and avoid locks. Implement it for
pc64. This function checks whether the page is already faulted in as
requested by looking up the PTE. If not it returns NULL and the full
blown vm_fault_page() code continues running.

* (Matt) Remove the MP lock from most the vkernel's trap() code

* (Matt) Use a shared spinlock when possible for certain critical paths
related to the copyin/copyout path.

show more ...


Revision tags: v3.4.3
# c4d9bf46 09-Aug-2013 François Tigeot <ftigeot@wolfpond.org>

kernel: Add pmap_page_set_memattr() stubs for the vkernel platforms


Revision tags: v3.4.2, v3.4.0, v3.4.1, v3.4.0rc, v3.5.0, v3.2.2, v3.2.1, v3.2.0, v3.3.0
# 921c891e 13-Sep-2012 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Implement segment pmap optimizations for x86-64

* Implement 2MB segment optimizations for x86-64. Any shared read-only
or read-write VM object mapped into memory, including physical obje

kernel - Implement segment pmap optimizations for x86-64

* Implement 2MB segment optimizations for x86-64. Any shared read-only
or read-write VM object mapped into memory, including physical objects
(so both sysv_shm and mmap), which is a multiple of the segment size
and segment-aligned can be optimized.

* Enable with sysctl machdep.pmap_mmu_optimize=1

Default is off for now. This is an experimental feature.

* It works as follows: A VM object which is large enough will, when VM
faults are generated, store a truncated pmap (PD, PT, and PTEs) in the
VM object itself.

VM faults whos vm_map_entry's can be optimized will cause the PTE, PT,
and also the PD (for now) to be stored in a pmap embedded in the VM_OBJECT,
instead of in the process pmap.

The process pmap then creates PT entry in the PD page table that points
to the PT page table page stored in the VM_OBJECT's pmap.

* This removes nearly all page table overhead from fork()'d processes or
even unrelated process which massively share data via mmap() or sysv_shm.
We still recommend using sysctl kern.ipc.shm_use_phys=1 (which is now
the default), which also removes the PV entries associated with the
shared pmap. However, with this optimization PV entries are no longer
a big issue since they will not be replicated in each process, only in
the common pmap stored in the VM_OBJECT.

* Features of this optimization:

* Number of PV entries is reduced to approximately the number of live
pages and no longer multiplied by the number of processes separately
mapping the shared memory.

* One process faulting in a page naturally makes the PTE available to
all other processes mapping the same shared memory. The other processes
do not have to fault that same page in.

* Page tables survive process exit and restart.

* Once page tables are populated and cached, any new process that maps
the shared memory will take far fewer faults because each fault will
bring in an ENTIRE page table. Postgres w/ 64-clients, VM fault rate
was observed to drop from 1M faults/sec to less than 500 at startup,
and during the run the fault rates dropped from a steady decline into
the hundreds of thousands into an instant decline to virtually zero
VM faults.

* We no longer have to depend on sysv_shm to optimize the MMU.

* CPU caches will do a better job caching page tables since most of
them are now themselves shared. Even when we invltlb, more of the
page tables will be in the L1, L2, and L3 caches.

* EXPERIMENTAL!!!!!

show more ...


Revision tags: v3.0.3, v3.0.2, v3.0.1, v3.1.0, v3.0.0
# 86d7f5d3 26-Nov-2011 John Marino <draco@marino.st>

Initial import of binutils 2.22 on the new vendor branch

Future versions of binutils will also reside on this branch rather
than continuing to create new binutils branches for each new version.


# b12defdc 18-Oct-2011 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Major SMP performance patch / VM system, bus-fault/seg-fault fixes

This is a very large patch which reworks locking in the entire VM subsystem,
concentrated on VM objects and the x86-64 pma

kernel - Major SMP performance patch / VM system, bus-fault/seg-fault fixes

This is a very large patch which reworks locking in the entire VM subsystem,
concentrated on VM objects and the x86-64 pmap code. These fixes remove
nearly all the spin lock contention for non-threaded VM faults and narrows
contention for threaded VM faults to just the threads sharing the pmap.

Multi-socket many-core machines will see a 30-50% improvement in parallel
build performance (tested on a 48-core opteron), depending on how well
the build parallelizes.

As part of this work a long-standing problem on 64-bit systems where programs
would occasionally seg-fault or bus-fault for no reason has been fixed. The
problem was related to races between vm_fault, the vm_object collapse code,
and the vm_map splitting code.

* Most uses of vm_token have been removed. All uses of vm_spin have been
removed. These have been replaced with per-object tokens and per-queue
(vm_page_queues[]) spin locks.

Note in particular that since we still have the page coloring code the
PQ_FREE and PQ_CACHE queues are actually many queues, individually
spin-locked, resulting in very excellent MP page allocation and freeing
performance.

* Reworked vm_page_lookup() and vm_object->rb_memq. All (object,pindex)
lookup operations are now covered by the vm_object hold/drop system,
which utilize pool tokens on vm_objects. Calls now require that the
VM object be held in order to ensure a stable outcome.

Also added vm_page_lookup_busy_wait(), vm_page_lookup_busy_try(),
vm_page_busy_wait(), vm_page_busy_try(), and other API functions
which integrate the PG_BUSY handling.

* Added OBJ_CHAINLOCK. Most vm_object operations are protected by
the vm_object_hold/drop() facility which is token-based. Certain
critical functions which must traverse backing_object chains use
a hard-locking flag and lock almost the entire chain as it is traversed
to prevent races against object deallocation, collapses, and splits.

The last object in the chain (typically a vnode) is NOT locked in
this manner, so concurrent faults which terminate at the same vnode will
still have good performance. This is important e.g. for parallel compiles
which might be running dozens of the same compiler binary concurrently.

* Created a per vm_map token and removed most uses of vmspace_token.

* Removed the mp_lock in sys_execve(). It has not been needed in a while.

* Add kmem_lim_size() which returns approximate available memory (reduced
by available KVM), in megabytes. This is now used to scale up the
slab allocator cache and the pipe buffer caches to reduce unnecessary
global kmem operations.

* Rewrote vm_page_alloc(), various bits in vm/vm_contig.c, the swapcache
scan code, and the pageout scan code. These routines were rewritten
to use the per-queue spin locks.

* Replaced the exponential backoff in the spinlock code with something
a bit less complex and cleaned it up.

* Restructured the IPIQ func/arg1/arg2 array for better cache locality.
Removed the per-queue ip_npoll and replaced it with a per-cpu gd_npoll,
which is used by other cores to determine if they need to issue an
actual hardware IPI or not. This reduces hardware IPI issuance
considerably (and the removal of the decontention code reduced it even
more).

* Temporarily removed the lwkt thread fairq code and disabled a number of
features. These will be worked back in once we track down some of the
remaining performance issues.

Temproarily removed the lwkt thread resequencer for tokens for the same
reason. This might wind up being permanent.

Added splz_check()s in a few critical places.

* Increased the number of pool tokens from 1024 to 4001 and went to a
prime-number mod algorithm to reduce overlaps.

* Removed the token decontention code. This was a bit of an eyesore and
while it did its job when we had global locks it just gets in the way now
that most of the global locks are gone.

Replaced the decontention code with a fall back which acquires the
tokens in sorted order, to guarantee that deadlocks will always be
resolved eventually in the scheduler.

* Introduced a simplified spin-for-a-little-while function
_lwkt_trytoken_spin() that the token code now uses rather than giving
up immediately.

* The vfs_bio subsystem no longer uses vm_token and now uses the
vm_object_hold/drop API for buffer cache operations, resulting
in very good concurrency.

* Gave the vnode its own spinlock instead of sharing vp->v_lock.lk_spinlock,
which fixes a deadlock.

* Adjusted all platform pamp.c's to handle the new main kernel APIs. The
i386 pmap.c is still a bit out of date but should be compatible.

* Completely rewrote very large chunks of the x86-64 pmap.c code. The
critical path no longer needs pmap_spin but pmap_spin itself is still
used heavily, particularin the pv_entry handling code.

A per-pmap token and per-pmap object are now used to serialize pmamp
access and vm_page lookup operations when needed.

The x86-64 pmap.c code now uses only vm_page->crit_count instead of
both crit_count and hold_count, which fixes races against other parts of
the kernel uses vm_page_hold().

_pmap_allocpte() mechanics have been completely rewritten to remove
potential races. Much of pmap_enter() and pmap_enter_quick() has also
been rewritten.

Many other changes.

* The following subsystems (and probably more) no longer use the vm_token
or vmobj_token in critical paths:

x The swap_pager now uses the vm_object_hold/drop API instead of vm_token.

x mmap() and vm_map/vm_mmap in general now use the vm_object_hold/drop API
instead of vm_token.

x vnode_pager

x zalloc

x vm_page handling

x vfs_bio

x umtx system calls

x vm_fault and friends

* Minor fixes to fill_kinfo_proc() to deal with process scan panics (ps)
revealed by recent global lock removals.

* lockmgr() locks no longer support LK_NOSPINWAIT. Spin locks are
unconditionally acquired.

* Replaced netif/e1000's spinlocks with lockmgr locks. The spinlocks
were not appropriate owing to the large context they were covering.

* Misc atomic ops added

show more ...


Revision tags: v2.12.0, v2.13.0, v2.10.1, v2.11.0, v2.10.0
# da23a592 09-Dec-2010 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add support for up to 63 cpus & 512G of ram for 64-bit builds.

* Increase SMP_MAXCPU to 63 for 64-bit builds.

* cpumask_t is 64 bits on 64-bit builds now. It remains 32 bits on 32-bit
b

kernel - Add support for up to 63 cpus & 512G of ram for 64-bit builds.

* Increase SMP_MAXCPU to 63 for 64-bit builds.

* cpumask_t is 64 bits on 64-bit builds now. It remains 32 bits on 32-bit
builds.

* Add #define's for atomic_set_cpumask(), atomic_clear_cpumask, and
atomic_cmpset_cpumask(). Replace all use cases on cpu masks with
these functions.

* Add CPUMASK(), BSRCPUMASK(), and BSFCPUMASK() macros. Replace all
use cases on cpu masks with these functions.

In particular note that (1 << cpu) just doesn't work with a 64-bit
cpumask.

Numerous bits of assembly also had to be adjusted to use e.g. btq instead
of btl, etc.

* Change __uint32_t declarations that were meant to be cpu masks to use
cpumask_t (most already have).

Also change other bits of code which work on cpu masks to be more agnostic.
For example, poll_cpumask0 and lwp_cpumask.

* 64-bit atomic ops cannot use "iq", they must use "r", because most x86-64
do NOT have 64-bit immediate value support.

* Rearrange initial kernel memory allocations to start from KvaStart and
not KERNBASE, because only 2GB of KVM is available after KERNBASE.

Certain VM allocations with > 32G of ram can exceed 2GB. For example,
vm_page_array[]. 2GB was not enough.

* Remove numerous mdglobaldata fields that are not used.

* Align CPU_prvspace[] for now. Eventually it will be moved into a
mapped area. Reserve sufficient space at MPPTDI now, but it is
still unused.

* When pre-allocating kernel page table PD entries calculate the number
of page table pages at KvaStart and at KERNBASE separately, since
the KVA space starting at KERNBASE caps out at 2GB.

* Change kmem_init() and vm_page_startup() to not take memory range
arguments. Instead the globals (virtual_start and virtual_end) are
manipualted directly.

show more ...


Revision tags: v2.9.1, v2.8.2, v2.8.1, v2.8.0, v2.9.0
# ad54aa11 15-Sep-2010 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Increase x86_64 & vkernel kvm, adjust vm_page_array mapping

* Change the vm_page_array and dmesg space to not use the DMAP area.
The space could not be accessed by userland kvm utilities

kernel - Increase x86_64 & vkernel kvm, adjust vm_page_array mapping

* Change the vm_page_array and dmesg space to not use the DMAP area.
The space could not be accessed by userland kvm utilities due
to that issue.

TODO - reoptimize to use 2M super-pages.

* Auto-size NKPT to accomodate the above changes as vm_page_array[]
is now mapped into the kernel page tables.

* Increase NKPDPE to 128 PDPs to accomodate machines with large
amounts of ram. This increases the kernel KVA space to 128G.

show more ...


Revision tags: v2.6.3, v2.7.3, v2.6.2, v2.7.2, v2.7.1, v2.6.1, v2.7.0, v2.6.0, v2.5.1, v2.4.1, v2.5.0, v2.4.0
# da673940 17-Aug-2009 Jordan Gordeev <jgordeev@dir.bg>

Add platform vkernel64.