History log of /dragonfly/sys/vm/vm_page.c (Results 51 – 75 of 192)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 95270b7e 01-Feb-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Many fixes for vkernel support, plus a few main kernel fixes

REAL KERNEL

* The big enchillada is that the main kernel's thread switch code has
a small timing window where it clears t

kernel - Many fixes for vkernel support, plus a few main kernel fixes

REAL KERNEL

* The big enchillada is that the main kernel's thread switch code has
a small timing window where it clears the PM_ACTIVE bit for the cpu
while switching between two threads. However, it *ALSO* checks and
avoids loading the %cr3 if the two threads have the same pmap.

This results in a situation where an invalidation on the pmap in another
cpuc may not have visibility to the cpu doing the switch, and yet the
cpu doing the switch also decides not to reload %cr3 and so does not
invalidate the TLB either. The result is a stale TLB and bad things
happen.

For now just unconditionally load %cr3 until I can come up with code
to handle the case.

This bug is very difficult to reproduce on a normal system, it requires
a multi-threaded program doing nasty things (munmap, etc) on one cpu
while another thread is switching to a third thread on some other cpu.

* KNOTE after handling the vkernel trap in postsig() instead of before.

* Change the kernel's pmap_inval_smp() code to take a 64-bit npgs
argument instead of a 32-bit npgs argument. This fixes situations
that crop up when a process uses more than 16TB of address space.

* Add an lfence to the pmap invalidation code that I think might be
needed.

* Handle some wrap/overflow cases in pmap_scan() related to the use of
large address spaces.

* Fix an unnecessary invltlb in pmap_clearbit() for unmanaged PTEs.

* Test PG_RW after locking the pv_entry to handle potential races.

* Add bio_crc to struct bio. This field is only used for debugging for
now but may come in useful later.

* Add some global debug variables in the pmap_inval_smp() and related
paths. Refactor the npgs handling.

* Load the tsc_target field after waiting for completion of the previous
invalidation op instead of before. Also add a conservative mfence()
in the invalidation path before loading the info fields.

* Remove the global pmap_inval_bulk_count counter.

* Adjust swtch.s to always reload the user process %cr3, with an
explanation. FIXME LATER!

* Add some test code to vm/swap_pager.c which double-checks that the page
being paged out does not get corrupted during the operation. This code
is #if 0'd.

* We must hold an object lock around the swp_pager_meta_ctl() call in
swp_pager_async_iodone(). I think.

* Reorder when PG_SWAPINPROG is cleared. Finish the I/O before clearing
the bit.

* Change the vm_map_growstack() API to pass a vm_map in instead of
curproc.

* Use atomic ops for vm_object->generation counts, since objects can be
locked shared.

VKERNEL

* Unconditionally save the FP state after returning from VMSPACE_CTL_RUN.
This solves a severe FP corruption bug in the vkernel due to calls it
makes into libc (which uses %xmm registers all over the place).

This is not a complete fix. We need a formal userspace/kernelspace FP
abstraction. Right now the vkernel doesn't have a kernelspace FP
abstraction so if a kernel thread switches preemptively bad things
happen.

* The kernel tracks and locks pv_entry structures to interlock pte's.
The vkernel never caught up, and does not really have a pv_entry or
placemark mechanism. The vkernel's pmap really needs a complete
re-port from the real-kernel pmap code. Until then, we use poor hacks.

* Use the vm_page's spinlock to interlock pte changes.

* Make sure that PG_WRITEABLE is set or cleared with the vm_page
spinlock held.

* Have pmap_clearbit() acquire the pmobj token for the pmap in the
iteration. This appears to be necessary, currently, as most of the
rest of the vkernel pmap code also uses the pmobj token.

* Fix bugs in the vkernel's swapu32() and swapu64().

* Change pmap_page_lookup() and pmap_unwire_pgtable() to fully busy
the page. Note however that a page table page is currently never
soft-busied. Also other vkernel code that busies a page table page.

* Fix some sillycode in a pmap->pm_ptphint test.

* Don't inherit e.g. PG_M from the previous pte when overwriting it
with a pte of a different physical address.

* Change the vkernel's pmap_clear_modify() function to clear VTPE_RW
(which also clears VPTE_M), and not just VPTE_M. Formally we want
the vkernel to be notified when a page becomes modified and it won't
be unless we also clear VPTE_RW and force a fault. <--- I may change
this back after testing.

* Wrap pmap_replacevm() with a critical section.

* Scrap the old grow_stack() code. vm_fault() and vm_fault_page() handle
it (vm_fault_page() just now got the ability).

* Properly flag VM_FAULT_USERMODE.

show more ...


# e4b2227a 30-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add missing flag to vm_page_alloc() initializer.

* vm_page_alloc() (and the contig functions too) clear most vm_page->flags,
but not all. PG_ACTIONLIST was being improperly cleared.

* F

kernel - Add missing flag to vm_page_alloc() initializer.

* vm_page_alloc() (and the contig functions too) clear most vm_page->flags,
but not all. PG_ACTIONLIST was being improperly cleared.

* Fix the case (may fix occassional races in usched).

* Add a #define to ensure the flags we need to keep are defined in only
one place.

show more ...


# 0062b9ff 26-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Remove object->agg_pv_list_count

* Remove the object->agg_pv_list_count field. It represents an unnecessary
global cache bounce, was only being used to help report vkernel RSS,
and was

kernel - Remove object->agg_pv_list_count

* Remove the object->agg_pv_list_count field. It represents an unnecessary
global cache bounce, was only being used to help report vkernel RSS,
and wasn't working very well anyway.

show more ...


# c183e2fc 22-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Improve vm_page_register_action*() performance.

* Improve the performance for vm_page_register_action() and related
routines by splitting the global lock into per-hash-index locks.
Also

kernel - Improve vm_page_register_action*() performance.

* Improve the performance for vm_page_register_action() and related
routines by splitting the global lock into per-hash-index locks.
Also change from a token to lockmgr locks.

* Shift some code around in umtx_sleep() so the tsleep_interlock()
occurs after the registration code to avoid interference with
the new lockmgr() operations in the registration code.

show more ...


# 75979118 09-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Further refactor vmstats, adjust page coloring algorithm

* Further refactor vmstats by tracking adjustments in gd->gd_vmstats_adj
and doing a copyback of the global vmstats into gd->gd_vm

kernel - Further refactor vmstats, adjust page coloring algorithm

* Further refactor vmstats by tracking adjustments in gd->gd_vmstats_adj
and doing a copyback of the global vmstats into gd->gd_vmstats. All
code critical paths access the localized copy to test VM state, removing
most global cache ping pongs of the global structure. The global
structure 'vmstats' still contains the master copy.

* Bump PQ_L2_SIZE up to 512. We use this to localized the VM page queues.
Make some adjustments to the pg_color calculation to reduce (in fact
almost eliminate) SMP conflicts on the vm_page_queue[] between cpus
when the VM system is operating normally (not paging).

* This pumps the 4-socket opteron test system up to ~4.5-4.7M page
faults/sec in testing (using a mmap/bzero/munmap loop on 16MB x N
processes).

This pumps the 2-socket xeon test system up to 4.6M page faults/sec
with 32 threads (250K/sec on one core, 1M on 4 cores, 4M on 16 cores,
5.6M on 32 threads). This is near the theoretical maximum possible for
this test.

* In this particular page fault test, PC sampling indicates *NO* further
globals are undergoing cache ping-ponging. The PC sampling predominantly
indicates pagezero(), which is expected. The Xeon is zeroing an aggregate
of 22GBytes/sec at 32 threads running normal vm_fault's.

show more ...


# 5ba14d44 08-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Remove most global atomic ops for VM page statistics

* Use a pcpu globaldata->gd_vmstats to update page statistics.

* Hardclock rolls the individual stats into the global vmstats structure

kernel - Remove most global atomic ops for VM page statistics

* Use a pcpu globaldata->gd_vmstats to update page statistics.

* Hardclock rolls the individual stats into the global vmstats structure.

* Force-roll any pcpu stat that goes below -10, to ensure that the low-memory
handling algorithms still work properly.

show more ...


# 6ba5daf8 08-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Move vm_page spin locks from pool to vm_page structure

* Move the vm_page spin lock from a pool to per-structure. This does bloat
the vm_page structure, but clears up an area of contenti

kernel - Move vm_page spin locks from pool to vm_page structure

* Move the vm_page spin lock from a pool to per-structure. This does bloat
the vm_page structure, but clears up an area of contention under heavy
VM loads.

show more ...


# 070a58b3 07-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Implement CPU localization hinting for low level page allocations

* By default vm_page_alloc() and kmem_alloc*() localize to the calling cpu.

* A cpu override may be passed in the flags to

kernel - Implement CPU localization hinting for low level page allocations

* By default vm_page_alloc() and kmem_alloc*() localize to the calling cpu.

* A cpu override may be passed in the flags to make these functions localize
differently.

* Currently implemented as a test only for the pcpu globaldata, idle
thread, and stacks for kernel threads targetted to specific cpus.

show more ...


# 6f2099fe 06-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add NUMA awareness to vm_page_alloc() and related functions (2)

* Fix miscellaneous bugs in the recent NUMA commits.

* Add kern.numa_disable, setting this to 1 in /boot/loader.conf will

kernel - Add NUMA awareness to vm_page_alloc() and related functions (2)

* Fix miscellaneous bugs in the recent NUMA commits.

* Add kern.numa_disable, setting this to 1 in /boot/loader.conf will
disable the NUMA code. Note that NUMA is only applicable on multi-socket
systems.

show more ...


# c7f9edd8 06-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add NUMA awareness to vm_page_alloc() and related functions

* Add NUMA awareness to the kernel memory subsystem. This first iteration
will primarily affect user pages. kmalloc and objca

kernel - Add NUMA awareness to vm_page_alloc() and related functions

* Add NUMA awareness to the kernel memory subsystem. This first iteration
will primarily affect user pages. kmalloc and objcache are not
NUMA-friendly yet (and its questionable how useful it would be to make
them so).

* Tested with synth on monster (4-socket opteron / 48 cores) and a 2-socket
xeon (32 threads). Appears to dole out localized pages 5:1 to 10:1.

show more ...


# 77c48adb 06-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Refactor phys_avail[] and dump_avail[]

* Refactor phys_avail[] and dump_avail[] into a more understandable
structure.


# fde6be6a 03-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - vm_object work

* Adjust OBJT_SWAP object management to be more SMP friendly. The hash
table now uses a combined structure to reduce unnecessary cache
interactions.

* Allocate VM objec

kernel - vm_object work

* Adjust OBJT_SWAP object management to be more SMP friendly. The hash
table now uses a combined structure to reduce unnecessary cache
interactions.

* Allocate VM objects via kmalloc() instead of zalloc. Remove the zalloc
pool for VM objects and use kmalloc(). Early initialization of the kernel
does not have to access vm_object allocation functions until after basic
VM initialization.

* Remove a vm_page_cache console warning that is no longer applicable.
(It could be triggered by the RSS rlimit handling code).

show more ...


# da2da420 01-Jan-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix bugs in recent RSS/swap commits

* Refactor the vm_page_try_to_cache() call to take a page already busied,
and fix a case where it was previously being called improperly that left
a

kernel - Fix bugs in recent RSS/swap commits

* Refactor the vm_page_try_to_cache() call to take a page already busied,
and fix a case where it was previously being called improperly that left
a VM page permanently busy.

show more ...


# 534ee349 28-Dec-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Implement RLIMIT_RSS, Increase maximum supported swap

* Implement RLIMIT_RSS by forcing pages out to swap if a process's RSS
exceeds the rlimit. Currently the algorith used to choose the

kernel - Implement RLIMIT_RSS, Increase maximum supported swap

* Implement RLIMIT_RSS by forcing pages out to swap if a process's RSS
exceeds the rlimit. Currently the algorith used to choose the pages
is fairly unsophisticated (we don't have the luxury of a per-process
vm_page_queues[] array).

* Implement the swap_user_async sysctl, default off. This sysctl can be
set to 1 to enable asynchronous paging in the RSS code. This is mostly
for testing and is not recommended since it allows the process to eat
memory more quickly than it can be paged out.

* Reimplement vm.swap_burst_read so the sysctl now specifies the number
of pages that are allowed to be burst. Still disabled by default (will
be enabled in a followup commit).

* Fix an overflow in the nswap_lowat and nswap_hiwat calculations.

* Refactor some of the pageout code to support synchronous direct
paging, which the RSS code uses. Thew new code also implements a
feature that will move clean pages to PQ_CACHE, making them immediately
reallocatable.

* Refactor the vm_pageout_deficit variable, using atomic ops.

* Fix an issue in vm_pageout_clean() (originally part of the inactive scan)
which prevented clustering from operating properly on write.

* Refactor kern/subr_blist.c and all associated code that uses to increase
swblk_t from int32_t to int64_t, and to increase the radix supported from
31 bits to 63 bits.

This increases the maximum supported swap from 2TB to some ungodly large
value. Remember that, by default, space for up to 4 swap devices
is preallocated so if you are allocating insane amounts of swap it is
best to do it with four equal-sized partitions instead of one so kernel
memory is efficiently allocated.

* There are two kernel data structures associated with swap. The blmeta
structure which has approximately a 1:8192 ratio (ram:swap) and is
pre-allocated up-front, and the swmeta structure whos KVA is reserved
but not allocated.

The swmeta structure has a 1:341 ratio. It tracks swap assignments for
pages in vm_object's. The kernel limits the number of structures to
approximately half of physical memory, meaning that if you have a machine
with 16GB of ram the maximum amount of swapped-out data you can support
with that is 16/2*341 = 2.7TB. Not that you would actually want to eat
half your ram to do actually do that.

A large system with, say, 128GB of ram, would be able to support
128/2*341 = 21TB of swap. The ultimate limitation is the 512GB of KVM.
The swap system can use up to 256GB of this so the maximum swap currently
supported by DragonFly on a machine with > 512GB of ram is going to be
256/2*341 = 43TB. To expand this further would require some adjustments
to increase the amount of KVM supported by the kernel.

* WARNING! swmeta is allocated via zalloc(). Once allocated, the memory
can be reused for swmeta but cannot be freed for use by other subsystems.
You should only configure as much swap as you are willing to reserve ram
for.

show more ...


# a6225b5b 28-Nov-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix deadlock in vm_page_repurpose()

* vm_page_repurpose() was hard+soft busying the underlying VM page,
which can deadlock against putpages or other I/O.

* Only hard-busy the page, then

kernel - Fix deadlock in vm_page_repurpose()

* vm_page_repurpose() was hard+soft busying the underlying VM page,
which can deadlock against putpages or other I/O.

* Only hard-busy the page, then add an SBUSY test to the failure case
(we don't want to repurpose a page undergoing I/O after all!).

show more ...


Revision tags: v4.6.1
# 2c9e2984 08-Oct-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix low memory process kill bug

* If a process is being killed, don't let it stay put in a low-memory
vm_wait loop in kernel mode, it will never exit.

* Try to improve the chances that w

kernel - Fix low memory process kill bug

* If a process is being killed, don't let it stay put in a low-memory
vm_wait loop in kernel mode, it will never exit.

* Try to improve the chances that we can dump by adjusting an assertion in
the user thread scheduler.

show more ...


# afd2da4d 03-Aug-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Remove PG_ZERO and zeroidle (page-zeroing) entirely

* Remove the PG_ZERO flag and remove all page-zeroing optimizations,
entirely. Aftering doing a substantial amount of testing, these

kernel - Remove PG_ZERO and zeroidle (page-zeroing) entirely

* Remove the PG_ZERO flag and remove all page-zeroing optimizations,
entirely. Aftering doing a substantial amount of testing, these
optimizations, which existed all the way back to CSRG BSD, no longer
provide any benefit on a modern system.

- Pre-zeroing a page only takes 80ns on a modern cpu. vm_fault overhead
in general is ~at least 1 microscond.

- Pre-zeroing a page leads to a cold-cache case on-use, forcing the fault
source (e.g. a userland program) to actually get the data from main
memory in its likely immediate use of the faulted page, reducing
performance.

- Zeroing the page at fault-time is actually more optimal because it does
not require any reading of dynamic ram and leaves the cache hot.

- Multiple synth and build tests show that active idle-time zeroing of
pages actually reduces performance somewhat and incidental allocations
of already-zerod pages (from page-table tear-downs) do not affect
performance in any meaningful way.

* Remove bcopyi() and obbcopy() -> collapse into bcopy(). These other
versions existed because bcopy() used to be specially-optimized and
could not be used in all situations. That is no longer true.

* Remove bcopy function pointer argument to m_devget(). It is no longer
used. This function existed to help support ancient drivers which might
have needed a special memory copy to read and write mapped data. It has
long been supplanted by BUSDMA.

show more ...


# 7d86823d 01-Aug-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Cleanup vm_page_pcpu_cache()

* Remove the empty vm_page_pcpu_cache() function and related call. Page
affinity is handled by the vm_page_queues[] array now.


Revision tags: v4.6.0
# 9002b0d5 30-Jul-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Refactor cpu localization for VM page allocations (2)

* Finish up the refactoring. Localize backoffs for search failures
by doing a masked domain search. This avoids bleeding into non-l

kernel - Refactor cpu localization for VM page allocations (2)

* Finish up the refactoring. Localize backoffs for search failures
by doing a masked domain search. This avoids bleeding into non-local
page queues until we've completely exhausted our local queues,
regardess of the starting pg_color index.

* We try to maintain 16-way set associativity for VM page allocations
even if the topology does not allow us to do it perfect. So, for
example, a 4-socket x 12-core (48-core) opteron can break the 256
queues into 4 x 64 queues, then split the 12-cores per socket into
sets of 3 giving 16 queues (the minimum) to each set of 3 cores.

* Refactor the page-zeroing code to only check the localized area.
This fixes a number of issues related to the zerod pages in the
queues winding up severely unbalanced. Other cpus in the local
group can help replentish a particular cpu's pre-zerod pages but
we intentionally allow a heavy user to exhaust the pages.

* Adjust the cpu topology code to normalize the physical package id.
Some machines start at 1, some machines start at 0. Normalize
everything to start at 0.

show more ...


# 33ee48c4 30-Jul-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Refactor cpu localization for VM page allocations

* Change how cpu localization works. The old scheme was extremely unbalanced
in terms of vm_page_queue[] load.

The new scheme uses cp

kernel - Refactor cpu localization for VM page allocations

* Change how cpu localization works. The old scheme was extremely unbalanced
in terms of vm_page_queue[] load.

The new scheme uses cpu topology information to break the vm_page_queue[]
down into major blocks based on the physical package id, minor blocks
based on the core id in each physical package, and then by 1's based on
(pindex + object->pg_color).

If PQ_L2_SIZE is not big enough such that 16-way operation is attainable
by physical and core id, we break the queue down only by physical id.

Note that the core id is a real core count, not a cpu thread count, so
an 8-core/16-thread x 2 socket xeon system will just fit in the 16-way
requirement (there are 256 PQ_FREE queues).

* When a particular queue does not have a free page, iterate nearby queues
start at +/- 1 (before we started at +/- PQ_L2_SIZE/2), in an attempt to
retain as much locality as possible. This won't be perfect but it should
be good enough.

* Also fix an issue with the idlezero counters.

show more ...


# bca42d4f 29-Jul-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Reduce memory testing and early-boot zeroing.

* Reduce the amount of memory testing and early-boot zeroing that
we do, improving boot times on systems with large amounts of memory.

* Fix

kernel - Reduce memory testing and early-boot zeroing.

* Reduce the amount of memory testing and early-boot zeroing that
we do, improving boot times on systems with large amounts of memory.

* Fix race in the page zeroing count.

* Refactor the VM zeroidle code. Instead of having just one kernel thread,
have one on each cpu.

This significantly increases the rate at which the machine can eat up
idle cycles to pre-zero pages in the cold path, improving performance
in the hot-path (normal) page allocations which request zerod pages.

* On systems with a lot of cpus there is usually a little idle time (e.g.
0.1%) on a few of the cpus, even under extreme loads. At the same time,
such loads might also imply a lot of zfod faults requiring zero'd pages.

On our 48-core opteron we see a zfod rate of 1.0 to 1.5 GBytes/sec and
a page-freeing rate of 1.3 - 2.5 GBytes/sec. Distributing the page
zeroing code and eating up these miniscule bits of idle improves the
kernel's ability to provide a pre-zerod page (vs having to zero-it in
the hot path) significantly.

Under the synth test load the kernel was still able to provide 400-700
MBytes/sec worth of pre-zerod pages whereas before this change the kernel
was only able to provide 20 MBytes/sec worth of pre-zerod pages.

show more ...


Revision tags: v4.6.0rc2, v4.6.0rc, v4.7.0
# 962f16a7 19-Jul-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - repurpose buffer cache entries under heavy I/O loads

* At buffer-cache I/O loads > 200 MBytes/sec (newbuf instantiations, not
cached buffer use), the buffer cache will now attempt to repu

kernel - repurpose buffer cache entries under heavy I/O loads

* At buffer-cache I/O loads > 200 MBytes/sec (newbuf instantiations, not
cached buffer use), the buffer cache will now attempt to repurpose the
VM pages in the buffer it is recycling instead of returning the pages
to the VM system.

* sysctl vfs.repurposedspace may be used to adjust the I/O load limit.

* The repurposing code attempts to free the VM page then reassign it to
the logical offset and vnode of the new buffer. If this succeeds, the
new buffer can be returned to the caller without having to run any
SMP tlb operations. If it fails, the pages will be either freed or
returned to the VM system and the buffer cache will act as before.

* The I/O load limit has a secondary beneficial effect which is to reduce
the allocation load on the VM system to something the pageout daemon can
handle while still allowing new pages up to the I/O load limit to transfer
to VM backing store. Thus, this mechanism ONLY effects systems with I/O
load limits above 200 MBytes/sec (or whatever programmed value you decide
on).

* Pages already in the VM page cache do not count towards the I/O load limit
when reconstituting a buffer.

show more ...


# dc6a6bd2 18-Jul-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Refactor buffer cache code in preparation for vm_page repurposing

* Keep buffer_map but no longer use vm_map_findspace/vm_map_delete to manage
buffer sizes. Instead, reserve MAXBSIZE of

kernel - Refactor buffer cache code in preparation for vm_page repurposing

* Keep buffer_map but no longer use vm_map_findspace/vm_map_delete to manage
buffer sizes. Instead, reserve MAXBSIZE of unallocated KVM for each buffer.

* Refactor the buffer cache management code. bufspace exhaustion now has
hysteresis, bufcount works just about the same.

* Start work on the repurposing code (currently disabled).

show more ...


Revision tags: v4.4.3, v4.4.2, v4.4.1, v4.4.0, v4.5.0, v4.4.0rc
# 9cd626ca 20-Aug-2015 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix several low memory+swap pageout/killproc issues

* Add significant slop to the PQAVERAGE() calculation in the
pageout daemon and increase the slop in the vm_paging_target()
test the

kernel - Fix several low memory+swap pageout/killproc issues

* Add significant slop to the PQAVERAGE() calculation in the
pageout daemon and increase the slop in the vm_paging_target()
test the pageout daemon makes to determine if it can stop early.

These adjustments fix a degenerate case when no swap is configured
and a large number of clean pages are present in the inactive queue
which could prevent the pageout daemon from cleaning a sufficient number
of pages and cause it to start killing processes even when plenty of
freeable memory exists.

* Impose a one-second delay when killing processes due to insufficient
memory + swap. This reduces the chance that multiple processes will
be killed even if the first one would have been sufficient by giving
the kernel more time to dipose of the process.

* Fix a bug in vm_page_alloc(). When v_free_count exactly matches
v_free_reserved it successfully passes the vm_page_count_target()
test but vm_page_alloc() will still fail. This results in a livelock
in vm_fault_object() and will livelock the pageout daemon vs a user
process stuck in vm_fault(), causing the machine to lock.

Fixed by adjusting the conditional test in vm_page_alloc().

Reported-by: luxh

show more ...


Revision tags: v4.2.4, v4.3.1, v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc
# 5a05c8a5 09-Jun-2015 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Increase DMA reserve from 16M to 128M by default

* People running DragonFly on workstations were having to specify
more than the default 16M for vm.dma_reserved in /boot/loader.conf
or

kernel - Increase DMA reserve from 16M to 128M by default

* People running DragonFly on workstations were having to specify
more than the default 16M for vm.dma_reserved in /boot/loader.conf
or their X sessions would not be stable.

* To reduce confusion, the dma_reserved default is being increased
to 128M which should be sufficient for most display setups.

People with headless servers will have to explicitly reduce the
reservation in /boot/loader.conf (back to 16m is my suggestions) if
they wish to recover the memory.

* This is the best compromise I could think of. We can't just return
the memory to the pool after boot because X might be started far later
on, or even potentially killed and restarted. Other drivers might also
depend on large swaths of contiguous physical memory being available.

The reserve is the best way to do it and I would rather things work out
of the box rather than forcing regular users to set something in
/boot/loader.conf.

show more ...


12345678