#
95270b7e |
| 01-Feb-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Many fixes for vkernel support, plus a few main kernel fixes
REAL KERNEL
* The big enchillada is that the main kernel's thread switch code has a small timing window where it clears t
kernel - Many fixes for vkernel support, plus a few main kernel fixes
REAL KERNEL
* The big enchillada is that the main kernel's thread switch code has a small timing window where it clears the PM_ACTIVE bit for the cpu while switching between two threads. However, it *ALSO* checks and avoids loading the %cr3 if the two threads have the same pmap.
This results in a situation where an invalidation on the pmap in another cpuc may not have visibility to the cpu doing the switch, and yet the cpu doing the switch also decides not to reload %cr3 and so does not invalidate the TLB either. The result is a stale TLB and bad things happen.
For now just unconditionally load %cr3 until I can come up with code to handle the case.
This bug is very difficult to reproduce on a normal system, it requires a multi-threaded program doing nasty things (munmap, etc) on one cpu while another thread is switching to a third thread on some other cpu.
* KNOTE after handling the vkernel trap in postsig() instead of before.
* Change the kernel's pmap_inval_smp() code to take a 64-bit npgs argument instead of a 32-bit npgs argument. This fixes situations that crop up when a process uses more than 16TB of address space.
* Add an lfence to the pmap invalidation code that I think might be needed.
* Handle some wrap/overflow cases in pmap_scan() related to the use of large address spaces.
* Fix an unnecessary invltlb in pmap_clearbit() for unmanaged PTEs.
* Test PG_RW after locking the pv_entry to handle potential races.
* Add bio_crc to struct bio. This field is only used for debugging for now but may come in useful later.
* Add some global debug variables in the pmap_inval_smp() and related paths. Refactor the npgs handling.
* Load the tsc_target field after waiting for completion of the previous invalidation op instead of before. Also add a conservative mfence() in the invalidation path before loading the info fields.
* Remove the global pmap_inval_bulk_count counter.
* Adjust swtch.s to always reload the user process %cr3, with an explanation. FIXME LATER!
* Add some test code to vm/swap_pager.c which double-checks that the page being paged out does not get corrupted during the operation. This code is #if 0'd.
* We must hold an object lock around the swp_pager_meta_ctl() call in swp_pager_async_iodone(). I think.
* Reorder when PG_SWAPINPROG is cleared. Finish the I/O before clearing the bit.
* Change the vm_map_growstack() API to pass a vm_map in instead of curproc.
* Use atomic ops for vm_object->generation counts, since objects can be locked shared.
VKERNEL
* Unconditionally save the FP state after returning from VMSPACE_CTL_RUN. This solves a severe FP corruption bug in the vkernel due to calls it makes into libc (which uses %xmm registers all over the place).
This is not a complete fix. We need a formal userspace/kernelspace FP abstraction. Right now the vkernel doesn't have a kernelspace FP abstraction so if a kernel thread switches preemptively bad things happen.
* The kernel tracks and locks pv_entry structures to interlock pte's. The vkernel never caught up, and does not really have a pv_entry or placemark mechanism. The vkernel's pmap really needs a complete re-port from the real-kernel pmap code. Until then, we use poor hacks.
* Use the vm_page's spinlock to interlock pte changes.
* Make sure that PG_WRITEABLE is set or cleared with the vm_page spinlock held.
* Have pmap_clearbit() acquire the pmobj token for the pmap in the iteration. This appears to be necessary, currently, as most of the rest of the vkernel pmap code also uses the pmobj token.
* Fix bugs in the vkernel's swapu32() and swapu64().
* Change pmap_page_lookup() and pmap_unwire_pgtable() to fully busy the page. Note however that a page table page is currently never soft-busied. Also other vkernel code that busies a page table page.
* Fix some sillycode in a pmap->pm_ptphint test.
* Don't inherit e.g. PG_M from the previous pte when overwriting it with a pte of a different physical address.
* Change the vkernel's pmap_clear_modify() function to clear VTPE_RW (which also clears VPTE_M), and not just VPTE_M. Formally we want the vkernel to be notified when a page becomes modified and it won't be unless we also clear VPTE_RW and force a fault. <--- I may change this back after testing.
* Wrap pmap_replacevm() with a critical section.
* Scrap the old grow_stack() code. vm_fault() and vm_fault_page() handle it (vm_fault_page() just now got the ability).
* Properly flag VM_FAULT_USERMODE.
show more ...
|
#
e4b2227a |
| 30-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add missing flag to vm_page_alloc() initializer.
* vm_page_alloc() (and the contig functions too) clear most vm_page->flags, but not all. PG_ACTIONLIST was being improperly cleared.
* F
kernel - Add missing flag to vm_page_alloc() initializer.
* vm_page_alloc() (and the contig functions too) clear most vm_page->flags, but not all. PG_ACTIONLIST was being improperly cleared.
* Fix the case (may fix occassional races in usched).
* Add a #define to ensure the flags we need to keep are defined in only one place.
show more ...
|
#
0062b9ff |
| 26-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Remove object->agg_pv_list_count
* Remove the object->agg_pv_list_count field. It represents an unnecessary global cache bounce, was only being used to help report vkernel RSS, and was
kernel - Remove object->agg_pv_list_count
* Remove the object->agg_pv_list_count field. It represents an unnecessary global cache bounce, was only being used to help report vkernel RSS, and wasn't working very well anyway.
show more ...
|
#
c183e2fc |
| 22-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Improve vm_page_register_action*() performance.
* Improve the performance for vm_page_register_action() and related routines by splitting the global lock into per-hash-index locks. Also
kernel - Improve vm_page_register_action*() performance.
* Improve the performance for vm_page_register_action() and related routines by splitting the global lock into per-hash-index locks. Also change from a token to lockmgr locks.
* Shift some code around in umtx_sleep() so the tsleep_interlock() occurs after the registration code to avoid interference with the new lockmgr() operations in the registration code.
show more ...
|
#
75979118 |
| 09-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Further refactor vmstats, adjust page coloring algorithm
* Further refactor vmstats by tracking adjustments in gd->gd_vmstats_adj and doing a copyback of the global vmstats into gd->gd_vm
kernel - Further refactor vmstats, adjust page coloring algorithm
* Further refactor vmstats by tracking adjustments in gd->gd_vmstats_adj and doing a copyback of the global vmstats into gd->gd_vmstats. All code critical paths access the localized copy to test VM state, removing most global cache ping pongs of the global structure. The global structure 'vmstats' still contains the master copy.
* Bump PQ_L2_SIZE up to 512. We use this to localized the VM page queues. Make some adjustments to the pg_color calculation to reduce (in fact almost eliminate) SMP conflicts on the vm_page_queue[] between cpus when the VM system is operating normally (not paging).
* This pumps the 4-socket opteron test system up to ~4.5-4.7M page faults/sec in testing (using a mmap/bzero/munmap loop on 16MB x N processes).
This pumps the 2-socket xeon test system up to 4.6M page faults/sec with 32 threads (250K/sec on one core, 1M on 4 cores, 4M on 16 cores, 5.6M on 32 threads). This is near the theoretical maximum possible for this test.
* In this particular page fault test, PC sampling indicates *NO* further globals are undergoing cache ping-ponging. The PC sampling predominantly indicates pagezero(), which is expected. The Xeon is zeroing an aggregate of 22GBytes/sec at 32 threads running normal vm_fault's.
show more ...
|
#
5ba14d44 |
| 08-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Remove most global atomic ops for VM page statistics
* Use a pcpu globaldata->gd_vmstats to update page statistics.
* Hardclock rolls the individual stats into the global vmstats structure
kernel - Remove most global atomic ops for VM page statistics
* Use a pcpu globaldata->gd_vmstats to update page statistics.
* Hardclock rolls the individual stats into the global vmstats structure.
* Force-roll any pcpu stat that goes below -10, to ensure that the low-memory handling algorithms still work properly.
show more ...
|
#
6ba5daf8 |
| 08-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Move vm_page spin locks from pool to vm_page structure
* Move the vm_page spin lock from a pool to per-structure. This does bloat the vm_page structure, but clears up an area of contenti
kernel - Move vm_page spin locks from pool to vm_page structure
* Move the vm_page spin lock from a pool to per-structure. This does bloat the vm_page structure, but clears up an area of contention under heavy VM loads.
show more ...
|
#
070a58b3 |
| 07-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Implement CPU localization hinting for low level page allocations
* By default vm_page_alloc() and kmem_alloc*() localize to the calling cpu.
* A cpu override may be passed in the flags to
kernel - Implement CPU localization hinting for low level page allocations
* By default vm_page_alloc() and kmem_alloc*() localize to the calling cpu.
* A cpu override may be passed in the flags to make these functions localize differently.
* Currently implemented as a test only for the pcpu globaldata, idle thread, and stacks for kernel threads targetted to specific cpus.
show more ...
|
#
6f2099fe |
| 06-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add NUMA awareness to vm_page_alloc() and related functions (2)
* Fix miscellaneous bugs in the recent NUMA commits.
* Add kern.numa_disable, setting this to 1 in /boot/loader.conf will
kernel - Add NUMA awareness to vm_page_alloc() and related functions (2)
* Fix miscellaneous bugs in the recent NUMA commits.
* Add kern.numa_disable, setting this to 1 in /boot/loader.conf will disable the NUMA code. Note that NUMA is only applicable on multi-socket systems.
show more ...
|
#
c7f9edd8 |
| 06-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add NUMA awareness to vm_page_alloc() and related functions
* Add NUMA awareness to the kernel memory subsystem. This first iteration will primarily affect user pages. kmalloc and objca
kernel - Add NUMA awareness to vm_page_alloc() and related functions
* Add NUMA awareness to the kernel memory subsystem. This first iteration will primarily affect user pages. kmalloc and objcache are not NUMA-friendly yet (and its questionable how useful it would be to make them so).
* Tested with synth on monster (4-socket opteron / 48 cores) and a 2-socket xeon (32 threads). Appears to dole out localized pages 5:1 to 10:1.
show more ...
|
#
77c48adb |
| 06-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor phys_avail[] and dump_avail[]
* Refactor phys_avail[] and dump_avail[] into a more understandable structure.
|
#
fde6be6a |
| 03-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - vm_object work
* Adjust OBJT_SWAP object management to be more SMP friendly. The hash table now uses a combined structure to reduce unnecessary cache interactions.
* Allocate VM objec
kernel - vm_object work
* Adjust OBJT_SWAP object management to be more SMP friendly. The hash table now uses a combined structure to reduce unnecessary cache interactions.
* Allocate VM objects via kmalloc() instead of zalloc. Remove the zalloc pool for VM objects and use kmalloc(). Early initialization of the kernel does not have to access vm_object allocation functions until after basic VM initialization.
* Remove a vm_page_cache console warning that is no longer applicable. (It could be triggered by the RSS rlimit handling code).
show more ...
|
#
da2da420 |
| 01-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix bugs in recent RSS/swap commits
* Refactor the vm_page_try_to_cache() call to take a page already busied, and fix a case where it was previously being called improperly that left a
kernel - Fix bugs in recent RSS/swap commits
* Refactor the vm_page_try_to_cache() call to take a page already busied, and fix a case where it was previously being called improperly that left a VM page permanently busy.
show more ...
|
#
534ee349 |
| 28-Dec-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Implement RLIMIT_RSS, Increase maximum supported swap
* Implement RLIMIT_RSS by forcing pages out to swap if a process's RSS exceeds the rlimit. Currently the algorith used to choose the
kernel - Implement RLIMIT_RSS, Increase maximum supported swap
* Implement RLIMIT_RSS by forcing pages out to swap if a process's RSS exceeds the rlimit. Currently the algorith used to choose the pages is fairly unsophisticated (we don't have the luxury of a per-process vm_page_queues[] array).
* Implement the swap_user_async sysctl, default off. This sysctl can be set to 1 to enable asynchronous paging in the RSS code. This is mostly for testing and is not recommended since it allows the process to eat memory more quickly than it can be paged out.
* Reimplement vm.swap_burst_read so the sysctl now specifies the number of pages that are allowed to be burst. Still disabled by default (will be enabled in a followup commit).
* Fix an overflow in the nswap_lowat and nswap_hiwat calculations.
* Refactor some of the pageout code to support synchronous direct paging, which the RSS code uses. Thew new code also implements a feature that will move clean pages to PQ_CACHE, making them immediately reallocatable.
* Refactor the vm_pageout_deficit variable, using atomic ops.
* Fix an issue in vm_pageout_clean() (originally part of the inactive scan) which prevented clustering from operating properly on write.
* Refactor kern/subr_blist.c and all associated code that uses to increase swblk_t from int32_t to int64_t, and to increase the radix supported from 31 bits to 63 bits.
This increases the maximum supported swap from 2TB to some ungodly large value. Remember that, by default, space for up to 4 swap devices is preallocated so if you are allocating insane amounts of swap it is best to do it with four equal-sized partitions instead of one so kernel memory is efficiently allocated.
* There are two kernel data structures associated with swap. The blmeta structure which has approximately a 1:8192 ratio (ram:swap) and is pre-allocated up-front, and the swmeta structure whos KVA is reserved but not allocated.
The swmeta structure has a 1:341 ratio. It tracks swap assignments for pages in vm_object's. The kernel limits the number of structures to approximately half of physical memory, meaning that if you have a machine with 16GB of ram the maximum amount of swapped-out data you can support with that is 16/2*341 = 2.7TB. Not that you would actually want to eat half your ram to do actually do that.
A large system with, say, 128GB of ram, would be able to support 128/2*341 = 21TB of swap. The ultimate limitation is the 512GB of KVM. The swap system can use up to 256GB of this so the maximum swap currently supported by DragonFly on a machine with > 512GB of ram is going to be 256/2*341 = 43TB. To expand this further would require some adjustments to increase the amount of KVM supported by the kernel.
* WARNING! swmeta is allocated via zalloc(). Once allocated, the memory can be reused for swmeta but cannot be freed for use by other subsystems. You should only configure as much swap as you are willing to reserve ram for.
show more ...
|
#
a6225b5b |
| 28-Nov-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix deadlock in vm_page_repurpose()
* vm_page_repurpose() was hard+soft busying the underlying VM page, which can deadlock against putpages or other I/O.
* Only hard-busy the page, then
kernel - Fix deadlock in vm_page_repurpose()
* vm_page_repurpose() was hard+soft busying the underlying VM page, which can deadlock against putpages or other I/O.
* Only hard-busy the page, then add an SBUSY test to the failure case (we don't want to repurpose a page undergoing I/O after all!).
show more ...
|
Revision tags: v4.6.1 |
|
#
2c9e2984 |
| 08-Oct-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix low memory process kill bug
* If a process is being killed, don't let it stay put in a low-memory vm_wait loop in kernel mode, it will never exit.
* Try to improve the chances that w
kernel - Fix low memory process kill bug
* If a process is being killed, don't let it stay put in a low-memory vm_wait loop in kernel mode, it will never exit.
* Try to improve the chances that we can dump by adjusting an assertion in the user thread scheduler.
show more ...
|
#
afd2da4d |
| 03-Aug-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Remove PG_ZERO and zeroidle (page-zeroing) entirely
* Remove the PG_ZERO flag and remove all page-zeroing optimizations, entirely. Aftering doing a substantial amount of testing, these
kernel - Remove PG_ZERO and zeroidle (page-zeroing) entirely
* Remove the PG_ZERO flag and remove all page-zeroing optimizations, entirely. Aftering doing a substantial amount of testing, these optimizations, which existed all the way back to CSRG BSD, no longer provide any benefit on a modern system.
- Pre-zeroing a page only takes 80ns on a modern cpu. vm_fault overhead in general is ~at least 1 microscond.
- Pre-zeroing a page leads to a cold-cache case on-use, forcing the fault source (e.g. a userland program) to actually get the data from main memory in its likely immediate use of the faulted page, reducing performance.
- Zeroing the page at fault-time is actually more optimal because it does not require any reading of dynamic ram and leaves the cache hot.
- Multiple synth and build tests show that active idle-time zeroing of pages actually reduces performance somewhat and incidental allocations of already-zerod pages (from page-table tear-downs) do not affect performance in any meaningful way.
* Remove bcopyi() and obbcopy() -> collapse into bcopy(). These other versions existed because bcopy() used to be specially-optimized and could not be used in all situations. That is no longer true.
* Remove bcopy function pointer argument to m_devget(). It is no longer used. This function existed to help support ancient drivers which might have needed a special memory copy to read and write mapped data. It has long been supplanted by BUSDMA.
show more ...
|
#
7d86823d |
| 01-Aug-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Cleanup vm_page_pcpu_cache()
* Remove the empty vm_page_pcpu_cache() function and related call. Page affinity is handled by the vm_page_queues[] array now.
|
Revision tags: v4.6.0 |
|
#
9002b0d5 |
| 30-Jul-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor cpu localization for VM page allocations (2)
* Finish up the refactoring. Localize backoffs for search failures by doing a masked domain search. This avoids bleeding into non-l
kernel - Refactor cpu localization for VM page allocations (2)
* Finish up the refactoring. Localize backoffs for search failures by doing a masked domain search. This avoids bleeding into non-local page queues until we've completely exhausted our local queues, regardess of the starting pg_color index.
* We try to maintain 16-way set associativity for VM page allocations even if the topology does not allow us to do it perfect. So, for example, a 4-socket x 12-core (48-core) opteron can break the 256 queues into 4 x 64 queues, then split the 12-cores per socket into sets of 3 giving 16 queues (the minimum) to each set of 3 cores.
* Refactor the page-zeroing code to only check the localized area. This fixes a number of issues related to the zerod pages in the queues winding up severely unbalanced. Other cpus in the local group can help replentish a particular cpu's pre-zerod pages but we intentionally allow a heavy user to exhaust the pages.
* Adjust the cpu topology code to normalize the physical package id. Some machines start at 1, some machines start at 0. Normalize everything to start at 0.
show more ...
|
#
33ee48c4 |
| 30-Jul-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor cpu localization for VM page allocations
* Change how cpu localization works. The old scheme was extremely unbalanced in terms of vm_page_queue[] load.
The new scheme uses cp
kernel - Refactor cpu localization for VM page allocations
* Change how cpu localization works. The old scheme was extremely unbalanced in terms of vm_page_queue[] load.
The new scheme uses cpu topology information to break the vm_page_queue[] down into major blocks based on the physical package id, minor blocks based on the core id in each physical package, and then by 1's based on (pindex + object->pg_color).
If PQ_L2_SIZE is not big enough such that 16-way operation is attainable by physical and core id, we break the queue down only by physical id.
Note that the core id is a real core count, not a cpu thread count, so an 8-core/16-thread x 2 socket xeon system will just fit in the 16-way requirement (there are 256 PQ_FREE queues).
* When a particular queue does not have a free page, iterate nearby queues start at +/- 1 (before we started at +/- PQ_L2_SIZE/2), in an attempt to retain as much locality as possible. This won't be perfect but it should be good enough.
* Also fix an issue with the idlezero counters.
show more ...
|
#
bca42d4f |
| 29-Jul-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Reduce memory testing and early-boot zeroing.
* Reduce the amount of memory testing and early-boot zeroing that we do, improving boot times on systems with large amounts of memory.
* Fix
kernel - Reduce memory testing and early-boot zeroing.
* Reduce the amount of memory testing and early-boot zeroing that we do, improving boot times on systems with large amounts of memory.
* Fix race in the page zeroing count.
* Refactor the VM zeroidle code. Instead of having just one kernel thread, have one on each cpu.
This significantly increases the rate at which the machine can eat up idle cycles to pre-zero pages in the cold path, improving performance in the hot-path (normal) page allocations which request zerod pages.
* On systems with a lot of cpus there is usually a little idle time (e.g. 0.1%) on a few of the cpus, even under extreme loads. At the same time, such loads might also imply a lot of zfod faults requiring zero'd pages.
On our 48-core opteron we see a zfod rate of 1.0 to 1.5 GBytes/sec and a page-freeing rate of 1.3 - 2.5 GBytes/sec. Distributing the page zeroing code and eating up these miniscule bits of idle improves the kernel's ability to provide a pre-zerod page (vs having to zero-it in the hot path) significantly.
Under the synth test load the kernel was still able to provide 400-700 MBytes/sec worth of pre-zerod pages whereas before this change the kernel was only able to provide 20 MBytes/sec worth of pre-zerod pages.
show more ...
|
Revision tags: v4.6.0rc2, v4.6.0rc, v4.7.0 |
|
#
962f16a7 |
| 19-Jul-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - repurpose buffer cache entries under heavy I/O loads
* At buffer-cache I/O loads > 200 MBytes/sec (newbuf instantiations, not cached buffer use), the buffer cache will now attempt to repu
kernel - repurpose buffer cache entries under heavy I/O loads
* At buffer-cache I/O loads > 200 MBytes/sec (newbuf instantiations, not cached buffer use), the buffer cache will now attempt to repurpose the VM pages in the buffer it is recycling instead of returning the pages to the VM system.
* sysctl vfs.repurposedspace may be used to adjust the I/O load limit.
* The repurposing code attempts to free the VM page then reassign it to the logical offset and vnode of the new buffer. If this succeeds, the new buffer can be returned to the caller without having to run any SMP tlb operations. If it fails, the pages will be either freed or returned to the VM system and the buffer cache will act as before.
* The I/O load limit has a secondary beneficial effect which is to reduce the allocation load on the VM system to something the pageout daemon can handle while still allowing new pages up to the I/O load limit to transfer to VM backing store. Thus, this mechanism ONLY effects systems with I/O load limits above 200 MBytes/sec (or whatever programmed value you decide on).
* Pages already in the VM page cache do not count towards the I/O load limit when reconstituting a buffer.
show more ...
|
#
dc6a6bd2 |
| 18-Jul-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor buffer cache code in preparation for vm_page repurposing
* Keep buffer_map but no longer use vm_map_findspace/vm_map_delete to manage buffer sizes. Instead, reserve MAXBSIZE of
kernel - Refactor buffer cache code in preparation for vm_page repurposing
* Keep buffer_map but no longer use vm_map_findspace/vm_map_delete to manage buffer sizes. Instead, reserve MAXBSIZE of unallocated KVM for each buffer.
* Refactor the buffer cache management code. bufspace exhaustion now has hysteresis, bufcount works just about the same.
* Start work on the repurposing code (currently disabled).
show more ...
|
Revision tags: v4.4.3, v4.4.2, v4.4.1, v4.4.0, v4.5.0, v4.4.0rc |
|
#
9cd626ca |
| 20-Aug-2015 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix several low memory+swap pageout/killproc issues
* Add significant slop to the PQAVERAGE() calculation in the pageout daemon and increase the slop in the vm_paging_target() test the
kernel - Fix several low memory+swap pageout/killproc issues
* Add significant slop to the PQAVERAGE() calculation in the pageout daemon and increase the slop in the vm_paging_target() test the pageout daemon makes to determine if it can stop early.
These adjustments fix a degenerate case when no swap is configured and a large number of clean pages are present in the inactive queue which could prevent the pageout daemon from cleaning a sufficient number of pages and cause it to start killing processes even when plenty of freeable memory exists.
* Impose a one-second delay when killing processes due to insufficient memory + swap. This reduces the chance that multiple processes will be killed even if the first one would have been sufficient by giving the kernel more time to dipose of the process.
* Fix a bug in vm_page_alloc(). When v_free_count exactly matches v_free_reserved it successfully passes the vm_page_count_target() test but vm_page_alloc() will still fail. This results in a livelock in vm_fault_object() and will livelock the pageout daemon vs a user process stuck in vm_fault(), causing the machine to lock.
Fixed by adjusting the conditional test in vm_page_alloc().
Reported-by: luxh
show more ...
|
Revision tags: v4.2.4, v4.3.1, v4.2.3, v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc |
|
#
5a05c8a5 |
| 09-Jun-2015 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Increase DMA reserve from 16M to 128M by default
* People running DragonFly on workstations were having to specify more than the default 16M for vm.dma_reserved in /boot/loader.conf or
kernel - Increase DMA reserve from 16M to 128M by default
* People running DragonFly on workstations were having to specify more than the default 16M for vm.dma_reserved in /boot/loader.conf or their X sessions would not be stable.
* To reduce confusion, the dma_reserved default is being increased to 128M which should be sufficient for most display setups.
People with headless servers will have to explicitly reduce the reservation in /boot/loader.conf (back to 16m is my suggestions) if they wish to recover the memory.
* This is the best compromise I could think of. We can't just return the memory to the pool after boot because X might be started far later on, or even potentially killed and restarted. Other drivers might also depend on large swaths of contiguous physical memory being available.
The reserve is the best way to do it and I would rather things work out of the box rather than forcing regular users to set something in /boot/loader.conf.
show more ...
|