History log of /dragonfly/sys/vm/vm_page.c (Results 1 – 25 of 192)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.2.1, v6.2.0, v6.3.0, v6.0.1
# 6379cf29 06-Jun-2021 Aaron LI <aly@aaronly.me>

kernel: Various minor whitespace adjustments and tweaks


# 1eeaf6b2 20-May-2021 Aaron LI <aly@aaronly.me>

vm: Change 'kernel_map' global to type of 'struct vm_map *'

Change the global variable 'kernel_map' from type 'struct vm_map' to a
pointer to this struct. This simplify the code a bit since all
inv

vm: Change 'kernel_map' global to type of 'struct vm_map *'

Change the global variable 'kernel_map' from type 'struct vm_map' to a
pointer to this struct. This simplify the code a bit since all
invocations take its address. This change also aligns with NetBSD's
'kernal_map' that it's also a pointer, which also helps the porting of
NVMM.

No functional changes.

show more ...


# 14067db6 18-Jun-2021 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add vm_page_alloczwq()/vm_page_freezwq(), refactor comments

* Add two functions to help support nvmm.

* Refactor comments for vm_page_alloc(), vm_page_grab().

* Rename a few variables for

kernel - Add vm_page_alloczwq()/vm_page_freezwq(), refactor comments

* Add two functions to help support nvmm.

* Refactor comments for vm_page_alloc(), vm_page_grab().

* Rename a few variables for consistency.

show more ...


# 2198d48d 18-May-2021 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Major refactor of pageout daemon algorithms (2)

* Refactor the pageout daemon's 'pass' variable handling. No longer
sleep or slow down the pageout daemon if paging has been running for

kernel - Major refactor of pageout daemon algorithms (2)

* Refactor the pageout daemon's 'pass' variable handling. No longer
sleep or slow down the pageout daemon if paging has been running for
a long time, doing so just places the system in an unrecoverable
state.

* Fix a vm_paging_target2() test that was inverted. This caused the
vm_pageout_scan_inactive() to improperly break out after scanning
one page per queue (x 1024 queues so it wasn't readily apparent).

* Modify the 'vmwait' and 'pfault' trip points (that block a process on
low memory) based on the process nice value. The result is that nice +N
processes which are heavy memory users will tend to block before
completely exhausting the FREE+CACHE pool, allowing other lower-niced
processes to continue running somewhat normally.

I say somewhat because the mechanism isn't perfect. However, it should
be good enough that if bulk work is partitioned a bit with higher nice
values, shells should remain responsive enough to diagnose issues even
during extreme paging.

* For the moment make the 'vmwait' trip point only slightly worse
than the 'pfault' trip point at any given p_nice. The 'vmwait' trip
point used to be significantly worse (that is, would trip earlier).

We have generally found that blockages on pfault are significantly more
damaging to performance than blockages on I/O-related allocations, so
don't completely equalize the two trip points.

* Have the pageout code check for and report possible vm_page queue
races when removing pages. In particular, pages appear to be able
to wind up in PQ_CACHE that are still PG_MAPPED.

* Fix various calculations that divide by MAXSCAN_DIVIDER to error
on the high-side instead of the low-side.

* Fix vm.pageout_debug output to reduce spew.

show more ...


# e91e64c7 17-May-2021 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Major refactor of pageout daemon algorithms

* Rewrite a large chunk of the pageout daemon's algorithm to significantly
improve page selection for pageout on low-memory systems.

* Impleme

kernel - Major refactor of pageout daemon algorithms

* Rewrite a large chunk of the pageout daemon's algorithm to significantly
improve page selection for pageout on low-memory systems.

* Implement persistent markers for hold and active queue scans. Instead
of moving pages within the queues, we now implement a persistent marker
and just move the marker instead. This ensures 100% fair scanning of
these queues.

* The pageout state machine is now governed by the following sysctls
(with some example default settings from a 32G box containing 8071042
pages):

vm.v_free_reserved: 20216
vm.v_free_min: 40419
vm.v_paging_wait: 80838
vm.v_paging_start: 121257
vm.v_paging_target1: 161676
vm.v_paging_target2: 202095

And separately

vm.v_inactive_target: 484161

The arrangement is as follows:

reserved < severe < minimum < wait < start < target1 < target2

* Paging is governed as follows: The pageout daemon is activated when
FREE+CACHE falls below (v_paging_start). The daemon will free memory
up until FREE+CACHE reaches (v_paging_target1), and then continue to
free memory up more slowly until FREE+CACHE reaches (v_paging_target2).

If, due to memory demand, FREE+CACHE falls below (v_paging_wait), most
userland processes will begin short-stalls on VM allocations and page
faults, and return to normal operation once FREE+CACHE goes above
(v_paging_wait) (that is, as soon as possible).

If, due to memory demand, FREE+CACHE falls below (v_paging_min), most
userland processes will block on VM allocations and page faults until
the level returns to above (v_paging_wait).

The hysteresis between (wait) and (start) allows most processes to
continue running normally during nominal paging activities.

* The pageout daemon operates in batches and then loops as necessary.
Pages will be moved from CACHE to FREE as necessary, then from INACTIVE
to CACHE as necessary, then from ACTIVE to INACTIVE as necessary. Care
is taken to avoid completely exhausting any given queue to ensure that
the queue scan is reasonably efficient.

* The ACTIVE to INACTIVE scan has been significantly reorganized and
integrated with the page_stats scan (which updates m->act_count for
pages in the ACTIVE queue). Pages in the ACTIVE queue are no longer
moved within the lists. Instead a persistent roving marker is employed
for each queue.

The m->act_count tests is made against a dynamically adjusted comparison
variable called vm.pageout_stats_actcmp. When no progress is made this
variable is increased, and when sufficient progress is made this variable
is decreased. Thus, under very heavy memory loads, a more permission
m->act_count test allows active pages to be deactivated more quickly.

* The INACTIVE to FREE+CACHE scan remains relatively unchanged. A two-pass
LRU arrangement continues to be employed in order to give the system
time to reclaim a deactivated page before it would otherwise get paged out.

* The vm_pageout_page_stats() scan has been almost completely rewritten.
This scan is responsible for updating m->act_count on pages in the
ACTIVE queue. Example sysctl settings shown below

vm.pageout_stats_rsecs: 300 <--- passive run time (seconds) after pageout
vm.pageout_stats_scan: 472 <--- max number of pages to scan per tick
vm.pageout_stats_ticks: 10 <--- poll rate in ticks
vm.pageout_stats_inamin: 16 <--- inactive ratio governing dynamic
vm.pageout_stats_inalim: 4096 adjustment of actcmnp.
vm.pageout_stats_actcmp: 2 <--- dynamically adjusted by the kernel

The page stats code polls slowly and will update m->act_count and
deactivate pages until it is able to achieve (v_inactive_target) worth
of pages in the inactive queue.

Once this target has been reached, the poll stops deactivating pages, but
will continue to run for (pageout_stats_rsecs) seconds after the pageout
daemon last ran (typically 5 minutes) and continue to passively update
m->act_count duiring this period.

The polling resumes upon any pageout daemon activation and the cycle
repeats.

* The vm_pageout_page_stats() scan is mostly responsible for selecting
the correct pages to move from ACTIVE to INACTIVE. Choosing the correct
pages allows the system to continue to operate smoothly while concurrent
paging is in progress. The additional 5 minutes of passive operation
allows it to pre-stage m->act_count for pages in the ACTIVE queue to
help grease the wheels for the next pageout daemon activation.

TESTING

* On a test box with memory limited to 2GB, running chrome. Video runs
smoothly despite constant paging. Active tabs appear to operate smoothly.
Inactive tabs are able to page-in decently fast and resume operation.

* On a workstation with 32GB of memory and a large number of open chrome
tabs, allowed to sit overnight (chrome burns up a lot of memory when tabs
remain open), then video tested the next day. Paging appeared to operate
well and so far there has been no stuttering.

* On a 64GB build box running dsynth 32/32 (intentionally overloaded). The
full bulk starts normally. The packages tend to get larger and larger as
they are built. dsynth and the pageout daemon operate reasonably well in
this situation. I was mostly looking for excessive stalls due to heavy
memory loads and it looks like the new code handles it quite well.

show more ...


Revision tags: v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1
# 19b9ca0e 05-Mar-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add minor VM shortcuts (2)

* Fix bug last commit. I was trying to shortcut the case where the
vm_page was not flagged MAPPED or WRITEABLE, but didn't read my
own code comment above the

kernel - Add minor VM shortcuts (2)

* Fix bug last commit. I was trying to shortcut the case where the
vm_page was not flagged MAPPED or WRITEABLE, but didn't read my
own code comment above the conditional and issued a vm_page_free()
without first checking to see if the VM object could be locked.

This lead to a livelock in the kernel under heavy loads.

* Rejigger the fix to do the shortcut in a slightly different
place.

show more ...


# 3a8f8248 04-Mar-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add minor VM shortcuts

* Adjust vm_page_hash_elm heuristic to save the full pindex field
instead of just the lower 32 bits.

* Refactor the hash table and hash lookup to index directly to

kernel - Add minor VM shortcuts

* Adjust vm_page_hash_elm heuristic to save the full pindex field
instead of just the lower 32 bits.

* Refactor the hash table and hash lookup to index directly to the
potential hit rather than masking to the SET size (~3). This
improves our chances of finding the requested page without having
to iterate.

The hash table is now N + SET sized and the SET iteration runs
from the potential direct-hit point forwards.

* Minor __predict* code optimizations.

* Shortcut vm_page_alloc() when PG_MAPPED|PG_WRITEABLE are clear
to avoid unnecessary code paths.

show more ...


Revision tags: v5.8.0
# c2830aa6 27-Feb-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Continue pmap work

* Conditionalize this work on PMAP_ADVANCED, default enabled.

* Remove md_page.pmap_count and md_page.writeable_count, no longer
track these counts which cause tons of

kernel - Continue pmap work

* Conditionalize this work on PMAP_ADVANCED, default enabled.

* Remove md_page.pmap_count and md_page.writeable_count, no longer
track these counts which cause tons of cache line interactions.

However, there are still a few stubborn hold-overs.

* The vm_page still needs to be soft-busied in the page fault path

* For now we need to have a md_page.interlock_count to flag pages
being replaced by pmap_enter() (e.g. COW faults) in order to be
able to safely dispose of the page without busying it.

This need will eventually go away, hopefully just leaving us with
the soft-busy-count issue.

show more ...


Revision tags: v5.9.0, v5.8.0rc1
# 9ba65fc3 15-Feb-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Microoptimization, avoid dirtying vm_page_hash entry

* Avoid dirtying the vm_page_hash entry unnecessarily with a
ticks update if the existing field already has the correct value.

The

kernel - Microoptimization, avoid dirtying vm_page_hash entry

* Avoid dirtying the vm_page_hash entry unnecessarily with a
ticks update if the existing field already has the correct value.

The VM page hash has an extreme level of SMP concurrency, so
avoiding cache coherency contention is important.

show more ...


# 13c79986 15-Feb-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Reduce SMP contention during low-memory stress

* When memory gets low vm_page_alloc() is forced to stray into
adjacent VM page queues to find free pages. This search can
expand to the

kernel - Reduce SMP contention during low-memory stress

* When memory gets low vm_page_alloc() is forced to stray into
adjacent VM page queues to find free pages. This search can
expand to the whole queue and cause massive SMP contention on
systems with many cores.

For example, if PQ_FREE has almost no pages but PQ_CACHE has
plenty of pages, the previous scan code widened its search
to the entire PQ_FREE queue (causing a ton of SMP contention)
before beginning a search of PQ_CACHE.

* The new scan code starts in PQ_FREE but once the search widens
sufficiently it will also simultaneously begin searching PQ_CACHE.

This allows the system to continue to allocate memory with minimal
contention as long as PQ_FREE or PQ_CACHE have pages.

* The new mechanism integrated a whole lot better with pageout
daemon behavior. The pageout daemon generally triggers off
the FREE+CACHE total and not (generally) off of low levels
for one or the other.

show more ...


# fb9a5136 14-Feb-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Increase size of the vm_page hash table

* Increase the size of the vm_page hash table used to shortcut
page lookups during a fault. Improves the hit rate on machines
with large amounts

kernel - Increase size of the vm_page hash table

* Increase the size of the vm_page hash table used to shortcut
page lookups during a fault. Improves the hit rate on machines
with large amounts of memory.

* Adjust the ticks overflow test from < 0 to < -1 in to avoid
getting tripped up by SMP races on the global 'ticks' variable
(which is not accessed atomically). One cpu can conceivably
update a hash ticks value while another cpu is doing a calculation
based on a stale copy of ticks.

Avoids premature vm_page_hash cache evictions due to this race.

show more ...


# df9266a1 12-Feb-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Improve vm_page_try_to_cache()

* In situations where this function is not able to cache the
page due to the page being dirtied, instead of just returning
at least ensure that it is move

kernel - Improve vm_page_try_to_cache()

* In situations where this function is not able to cache the
page due to the page being dirtied, instead of just returning
at least ensure that it is moved to the inactive queue if it
is currently on the active queue.

show more ...


# 4dbfefb6 11-Feb-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Improve pageout daemon pipelining.

* Improve the pageout daemon's ability to pipeline writes to the
swap pager. This deals with a number of low-memory situations
where the pageout daem

kernel - Improve pageout daemon pipelining.

* Improve the pageout daemon's ability to pipeline writes to the
swap pager. This deals with a number of low-memory situations
where the pageout daemon was stopping too early (at the minimum
free page mark).

* We don't want the pageout daemon to enforce the paging targets
after a successful pass (as this makes it impossible to actually
use the memory in question), but we DO want it to continue pipelining
if the page stats are still below the hysteresis point governed by
vm_paging_needed().

show more ...


Revision tags: v5.6.3
# 6d39eb19 02-Dec-2019 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Warn on duplicate physical addresses passed from BIOS

* Warn on duplicate PA free ranges passed from the BIOS instead
of panicing.

* Try to fix buggy threadripper BIOSes.


# 3f7b7260 23-Oct-2019 Sascha Wildner <saw@online.de>

world/kernel: Use the rounddown2() macro in various places.

Tested-by: zrj


# 0ad80e33 14-Sep-2019 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add needed ccfence and more error checks

* Add a cpu_ccfence() to PMAP_PAGE_BACKING_SCAN in order to prevent
over-optimization of the ipte load by the compiler.

* Add machine-dependent a

kernel - Add needed ccfence and more error checks

* Add a cpu_ccfence() to PMAP_PAGE_BACKING_SCAN in order to prevent
over-optimization of the ipte load by the compiler.

* Add machine-dependent assertion in the vm_page_free*() path to
ensure that the page is not normally mapped at the time of the
free.

show more ...


Revision tags: v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0
# fd1fd056 22-May-2019 Matthew Dillon <dillon@apollo.backplane.com>

kernel - VM rework part 20 - Fix vmmeter_neg_slop_cnt

* Fix some serious issues with the vmmeter_neg_slop_cnt calculation.
The main problem is that this calculation was then causing
vmstats.v_fr

kernel - VM rework part 20 - Fix vmmeter_neg_slop_cnt

* Fix some serious issues with the vmmeter_neg_slop_cnt calculation.
The main problem is that this calculation was then causing
vmstats.v_free_min to be recalculated to a much higher value
than it should beeen calculated to, resulting in systems starting
to page far earlier than they should.

For example, the 128G TR started paging tmpfs data with 25GB of
free memory, which was not intended. The correct target for that
amount of memory is more around 3GB.

* Remove vmmeter_neg_slop_cnt entirely and refactor the synchronization
code to be smarter. It will now synchronize vmstats fields whos
adjustments exceed -1024, but only if paging would actually be
needed in the worst-case scenario.

* This algorithm needs low-memory testing and might require more
tuning.

show more ...


# 0600465e 21-May-2019 Matthew Dillon <dillon@apollo.backplane.com>

kernel - VM rework part 17 - Cleanup

* Adjust kmapinfo and vmpageinfo in /usr/src/test/debug.
Enhance the code to display more useful information.

* Get pmap_page_stats_*() working again.

* Chan

kernel - VM rework part 17 - Cleanup

* Adjust kmapinfo and vmpageinfo in /usr/src/test/debug.
Enhance the code to display more useful information.

* Get pmap_page_stats_*() working again.

* Change systat -vm's 'VM' reporting. Replace VM-rss with PMAP and
VMRSS. Relabel VM-swp to SWAP and SWTOT.

PMAP - Amount of real memory faulted into user pmaps.

VMRSS - Sum of all process RSS's in thet system. This is
the 'virtual' memory faulted into user pmaps and
includes shared pages.

SWAP - Amount of swap space currently in use.

SWTOT - Total amount of swap installed.

* Redocument vm_page.h.

* Remove dead code from pmap.c (some left over cruft from the
days when pv_entry's were used for PTEs).

show more ...


# 78831f77 20-May-2019 Matthew Dillon <dillon@apollo.backplane.com>

kernel - VM rework part 16 - Optimization & cleanup pass

* Adjust __exclusive_cache_line to use 128-byte alignment as
per suggestion by mjg. Use this for the global vmstats.

* Add the vmmeter_ne

kernel - VM rework part 16 - Optimization & cleanup pass

* Adjust __exclusive_cache_line to use 128-byte alignment as
per suggestion by mjg. Use this for the global vmstats.

* Add the vmmeter_neg_slop_cnt global, which is a more generous
dynamic calculation verses -VMMETER_SLOP_COUNT. The idea is to
return how often vm_page_alloc() synchronizes its per-cpu statistics
with the global vmstats.

show more ...


# 831a8507 20-May-2019 Matthew Dillon <dillon@apollo.backplane.com>

kernel - VM rework part 15 - Core pmap work, refactor PG_*

* Augment PG_FICTITIOUS. This takes over some of PG_UNMANAGED's previous
capabilities. In addition, the pmap_*() API will work with fic

kernel - VM rework part 15 - Core pmap work, refactor PG_*

* Augment PG_FICTITIOUS. This takes over some of PG_UNMANAGED's previous
capabilities. In addition, the pmap_*() API will work with fictitious
pages, making mmap() operation (aka of the GPU) more consistent.

* Add PG_UNQUEUED. This prevents a vm_page from being manipulated in
the vm_page_queues[] in any way. This takes over another feature
of the old PG_UNMANAGED flag.

* Remove PG_UNMANAGED

* Remove PG_DEVICE_IDX. This is no longer relevant. We use PG_FICTITIOUS
for all device pages.

* Refactor vm_contig_pg_alloc(), vm_contig_pg_free(),
vm_page_alloc_contig(), and vm_page_free_contig().

These functions now set PG_FICTITIOUS | PG_UNQUEUED on the returned
pages, and properly clear the bits upon free or if/when a regular
(but special contig-managed) page is handed over to the normal paging
system.

This is combined with making the pmap*() functions work better with
PG_FICTITIOUS is the primary 'fix' for some of DRMs hacks.

show more ...


# f16f9121 20-May-2019 Matthew Dillon <dillon@apollo.backplane.com>

kernel - VM rework part 14 - Core pmap work, stabilize for X/drm

* Don't gratuitously change the vm_page flags in the drm code.

The vm_phys_fictitious_reg_range() code in drm_vm.c was clearing

kernel - VM rework part 14 - Core pmap work, stabilize for X/drm

* Don't gratuitously change the vm_page flags in the drm code.

The vm_phys_fictitious_reg_range() code in drm_vm.c was clearing
PG_UNMANAGED. It was only luck that this worked before, but
because these are faked pages, PG_UNMANAGED must be set or the
system will implode trying to convert the physical address back
to a vm_page in certain routines.

The ttm code was setting PG_FICTITIOUS in order to prevent the
page from getting into the active or inactive queues (they had
a conditional test for PG_FICTITIOUS). But ttm never cleared
the bit before freeing the page. Remove the hack and instead
fix it in vm_page.c

* in vm_object_terminate(), allow the case where there are still
wired pages in a OBJT_MGTDEVICE object that has wound up on a
queue (don't complain about it). This situation arises because the
ttm code uses the contig malloc API which returns wired pages.

NOTE: vm_page_activate()/vm_page_deactivate() are allowed to mess
with wired pages. Wired pages are not anything 'special' to
the queues, which allows us to avoid messing with the queues
when pages are assigned to the buffer cache.

show more ...


# e32fb2aa 19-May-2019 Matthew Dillon <dillon@apollo.backplane.com>

kernel - VM rework part 13 - Core pmap work, stabilize & optimize

* Refactor the vm_page_hash hash again to get a better distribution.

* I tried to only hash shared objects but this resulted in a n

kernel - VM rework part 13 - Core pmap work, stabilize & optimize

* Refactor the vm_page_hash hash again to get a better distribution.

* I tried to only hash shared objects but this resulted in a number of
edge cases where program re-use could miss the optimization.

* Add a sysctl vm.page_hash_vnode_only (default off). If turned on,
only vm_page's associated with vnodes will be hashed. This should
generally not be necessary.

* Refactor vm_page_list_find2() again to avoid all duplicate queue
checks. This time I mocked the algorithm up in userland and twisted
it until it did what I wanted.

* VM_FAULT_QUICK_DEBUG was accidently left on, turn it off.

* Do not remove the original page from the pmap when vm_fault_object()
must do a COW. And just in case this is ever added back in later,
don't do it using pmap_remove_specific() !!! Use pmap_remove_pages()
to avoid the backing scan lock.

vm_fault_page() will now do this removal (for procfs rwmem), the normal
vm_fault will of course replace the page anyway, and the umtx code
uses different recovery mechanisms now and should be ok.

* Optimize vm_map_entry_shadow() for the situation where the old
object is no longer shared. Get rid of an unnecessary transient
kmalloc() and vm_object_hold_shared().

show more ...


# e3c330f0 19-May-2019 Matthew Dillon <dillon@apollo.backplane.com>

kernel - VM rework part 12 - Core pmap work, stabilize & optimize

* Add tracking for the number of PTEs mapped writeable in md_page.
Change how PG_WRITEABLE and PG_MAPPED is cleared in the vm_page

kernel - VM rework part 12 - Core pmap work, stabilize & optimize

* Add tracking for the number of PTEs mapped writeable in md_page.
Change how PG_WRITEABLE and PG_MAPPED is cleared in the vm_page
to avoid clear/set races. This problem occurs because we would
have otherwise tried to clear the bits without hard-busying the
page. This allows the bits to be set with only an atomic op.

Procedures which test these bits universally do so while holding
the page hard-busied, and now call pmap_mapped_sfync() prior to
properly synchronize the bits.

* Fix bugs related to various counterse. pm_stats.resident_count,
wiring counts, vm_page->md.writeable_count, and
vm_page->md.pmap_count.

* Fix bugs related to synchronizing removed pte's with the vm_page.
Fix one case where we were improperly updating (m)'s state based
on a lost race against a pte swap-to-0 (pulling the pte).

* Fix a bug related to the page soft-busying code when the
m->object/m->pindex race is lost.

* Implement a heuristical version of vm_page_active() which just
updates act_count unlocked if the page is already in the
PQ_ACTIVE queue, or if it is fictitious.

* Allow races against the backing scan for pmap_remove_all() and
pmap_page_protect(VM_PROT_READ). Callers of these routines for
these cases expect full synchronization of the page dirty state.
We can identify when a page has not been fully cleaned out by
checking vm_page->md.pmap_count and vm_page->md.writeable_count.
In the rare situation where this happens, simply retry.

* Assert that the PTE pindex is properly interlocked in pmap_enter().
We still allows PTEs to be pulled by other routines without the
interlock, but multiple pmap_enter()s of the same page will be
interlocked.

* Assert additional wiring count failure cases.

* (UNTESTED) Flag DEVICE pages (dev_pager_getfake()) as being
PG_UNMANAGED. This essentially prevents all the various
reference counters (e.g. vm_page->md.pmap_count and
vm_page->md.writeable_count), PG_M, PG_A, etc from being
updated.

The vm_page's aren't tracked in the pmap at all because there
is no way to find them.. they are 'fake', so without a pv_entry,
we can't track them. Instead we simply rely on the vm_map_backing
scan to manipulate the PTEs.

* Optimize the new vm_map_entry_shadow() to use a shared object
token instead of an exclusive one. OBJ_ONEMAPPING will be cleared
with the shared token.

* Optimize single-threaded access to pmaps to avoid pmap_inval_*()
complexities.

* Optimize __read_mostly for more globals.

* Optimize pmap_testbit(), pmap_clearbit(), pmap_page_protect().
Pre-check vm_page->md.writeable_count and vm_page->md.pmap_count
for an easy degenerate return; before real work.

* Optimize pmap_inval_smp() and pmap_inval_smp_cmpset() for the
single-threaded pmap case, when called on the same CPU the pmap
is associated with. This allows us to use simple atomics and
cpu_*() instructions and avoid the complexities of the
pmap_inval_*() infrastructure.

* Randomize the page queue used in bio_page_alloc(). This does not
appear to hurt performance (e.g. heavy tmpfs use) on large many-core
NUMA machines and it makes the vm_page_alloc()'s job easier.

This change might have a downside for temporary files, but for more
long-lasting files there's no point allocating pages localized to a
particular cpu.

* Optimize vm_page_alloc().

(1) Refactor the _vm_page_list_find*() routines to avoid re-scanning
the same array indices over and over again when trying to find
a page.

(2) Add a heuristic, vpq.lastq, for each queue, which we set if a
_vm_page_list_find*() operation had to go far-afield to find its
page. Subsequent finds will skip to the far-afield position until
the current CPUs queues have pages again.

(3) Reduce PQ_L2_SIZE From an extravagant 2048 entries per queue down
to 1024. The original 2048 was meant to provide 8-way
set-associativity for 256 cores but wound up reducing performance
due to longer index iterations.

* Refactor the vm_page_hash[] array. This array is used to shortcut
vm_object locks and locate VM pages more quickly, without locks.
The new code limits the size of the array to something more reasonable,
implements a 4-way set-associative replacement policy using 'ticks',
and rewrites the hashing math.

* Effectively remove pmap_object_init_pt() for now. In current tests
it does not actually improve performance, probably because it may
map pages that are not actually used by the program.

* Remove vm_map_backing->refs. This field is no longer used.

* Remove more of the old now-stale code related to use of pv_entry's
for terminal PTEs.

* Remove more of the old shared page-table-page code. This worked but
could never be fully validated and was prone to bugs. So remove it.
In the future we will likely use larger 2MB and 1GB pages anyway.

* Remove pmap_softwait()/pmap_softhold()/pmap_softdone().

* Remove more #if 0'd code.

show more ...


# 530e94fc 17-May-2019 Matthew Dillon <dillon@apollo.backplane.com>

kernel - VM rework part 9 - Precursor work for terminal pv_entry removal

* Cleanup the API a bit

* Get rid of pmap_enter_quick()

* Remove unused procedures.

* Document that vm_page_protect() (and

kernel - VM rework part 9 - Precursor work for terminal pv_entry removal

* Cleanup the API a bit

* Get rid of pmap_enter_quick()

* Remove unused procedures.

* Document that vm_page_protect() (and thus the related
pmap_page_protect()) must be called with a hard-busied page. This
ensures that the operation does not race a new pmap_enter() of the page.

show more ...


Revision tags: v5.4.3
# 67e7cb85 14-May-2019 Matthew Dillon <dillon@apollo.backplane.com>

kernel - VM rework part 8 - Precursor work for terminal pv_entry removal

* Adjust structures so the pmap code can iterate backing_ba's with
just the vm_object spinlock.

Add a ba.pmap back-point

kernel - VM rework part 8 - Precursor work for terminal pv_entry removal

* Adjust structures so the pmap code can iterate backing_ba's with
just the vm_object spinlock.

Add a ba.pmap back-pointer.

Move entry->start and entry->end into the ba (ba.start, ba.end).
This is replicative of the base entry->ba.start and entry->ba.end,
but local modifications are locked by individual objects to allow
pmap ops to just look at backing ba's iterated via the object.

Remove the entry->map back-pointer.

Remove the ba.entry_base back-pointer.

* ba.offset is now an absolute offset and not additive. Adjust all code
that calculates and uses ba.offset (fortunately it is all concentrated
in vm_map.c and vm_fault.c).

* Refactor ba.start/offset/end modificatons to be atomic with
the necessary spin-locks to allow the pmap code to safely iterate
the vm_map_backing list for a vm_object.

* Test VM system with full synth run.

show more ...


12345678