History log of /dragonfly/sys/kern/vfs_cache.c (Results 1 – 25 of 212)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 5479a2c1 20-Oct-2023 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Clean up cache_hysteresis contention, better trigger_syncer()

* cache_hysteresis() is no longer multi-entrant, which causes
unnecessary contention between cpus. Only one cpu can run it

kernel - Clean up cache_hysteresis contention, better trigger_syncer()

* cache_hysteresis() is no longer multi-entrant, which causes
unnecessary contention between cpus. Only one cpu can run it
at a time and it just returns for other cpus if it is already
running.

* Add better trigger_syncer functions. Adding trigger_syncer_start()
and trigger_syncer_stop() to elide filesystem sleeps, ensuring that
a filesystem waiting on a flush due to excessive dirty pages does
not race the flusher and wind up twiddling its fingers while
no flush is happening.

show more ...


Revision tags: v6.4.0, v6.4.0rc1, v6.5.0
# a786f1c9 04-Jul-2022 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Attempt to fix broken vfs.cache.numunres tracker (2)

* The main culprit appears to be cache_allocroot() accounting
for new root ncps differently than the rest of the module.
So anything

kernel - Attempt to fix broken vfs.cache.numunres tracker (2)

* The main culprit appears to be cache_allocroot() accounting
for new root ncps differently than the rest of the module.
So anything which mounts and umounts continuously, like
dsynth, can seriously make the numbers whacky.

* Fix that and run an overnight test.

show more ...


# 417b1086 04-Jul-2022 Matthew Dillon <dillon@apollo.backplane.com>

kernel - check nc_generation in nlookup path

* With nc_generation now operating in a more usable manner, we can
use it in nlookup() to check for changes. When a change is detected,
the related

kernel - check nc_generation in nlookup path

* With nc_generation now operating in a more usable manner, we can
use it in nlookup() to check for changes. When a change is detected,
the related lock will be cycled and the entire nlookup() will retry up
to debug.nlookup_max_retries, which currently defaults to 4.

* Add debugging via debug.nlookup_debug. Set to 3 for nc_generation
debugging.

* Move "Parent directory lost" kprintfs into a debugging conditional,
reported via (debug.nlookup_debug & 4).

* This fixes lookup/remove races which could sometimes cause open()
and other system calls to return EINVAL or ENOTCONN. Basically
what happened was that nlookup() wound up on a NCF_DESTROYED entry.

* A few minutes worth of a dsynth bulk does not report any random
generation number mismatches or retries, so the code in this commit
is probably very close to correct.

show more ...


# cb96f7b7 04-Jul-2022 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Change ncp->nc_generation operation

* Change nc_generation operation. Bit 0 is reserved. The field is
incremented by 2 whenever major changes are being made to the ncp
(linking, unlin

kernel - Change ncp->nc_generation operation

* Change nc_generation operation. Bit 0 is reserved. The field is
incremented by 2 whenever major changes are being made to the ncp
(linking, unlinking, destruction, resolve, unresolve, vnode adjustment),
and then incremented by 2 again when the operation is complete.

The caller can test for a major gen change using:

curr_gen = ncp->nc_generation & ~3;
if ((orig_gen - curr_gen) & ~1)
(retry needed)

* Allows unlocked/relocked code to determine whether the ncp has possibly
changed or not (will be used in upcoming commits).

* Adjust the kern_rename() code to use the generation numbers.

* Bit 0 will be used to check for a combination of major changes and
lock cycling inthe future.

show more ...


# 8ae75bb2 03-Jul-2022 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Attempt to fix broken vfs.cache.numunres tracker

* Try to fix a mis-count that can accumulate under heavy loads.

* In cache_setvp() and cache_setunresolved(), only adjust the
unres count

kernel - Attempt to fix broken vfs.cache.numunres tracker

* Try to fix a mis-count that can accumulate under heavy loads.

* In cache_setvp() and cache_setunresolved(), only adjust the
unres count for namecache entries that are linked into the
topology.

show more ...


# 1201253b 11-Jun-2022 Matthew Dillon <dillon@apollo.backplane.com>

kernel - namecache eviction fixes

* Fix several namecache eviction issues which interfere with nlookup*()
functions.

There is an optimization where nlookup*() avoids locking intermediate
ncp'

kernel - namecache eviction fixes

* Fix several namecache eviction issues which interfere with nlookup*()
functions.

There is an optimization where nlookup*() avoids locking intermediate
ncp's in a path whenever possible on the assumption that the ref on
the ncp will prevent eviction. This assumption fails when the machine
is under a heavy namecache load.

Errors included spurious ENOTCONN and EINVAL error codes from file
operations.

* Refactor the namecache code to not evict resolved namecache entries
which have extra refs under normal operation. This allows nlookup*()
and other functions to operate semi-lockless for intermediate elements
in a path. However, they still obtain a ref which is a cache-unfriendly
atomic operation.

This fixes numerous weird errors that occur during heavy dsynth bulk
builds.

* Also fix a bug which evicted too many resolved namecache entries when
attempting to evict unresolved entries. This should improve performance
under heavy namecache loads a bit.

show more ...


Revision tags: v6.2.2
# ba1dbd39 27-May-2022 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix lock order reversal in cache_resolve_mp()

* This function is a helper when path lookups cross mount
boundaries.

* Locking order between namecache records and vnodes must
be { ncp,

kernel - Fix lock order reversal in cache_resolve_mp()

* This function is a helper when path lookups cross mount
boundaries.

* Locking order between namecache records and vnodes must
be { ncp, vnode }.

* Fix a lock order reversal in cache_resolve_mp() which
was doing { vnode, ncp }. This deadlock is very rare
because mount points are almost never evicted from the
namecache. However, dsynth can trigger this bug due
to its heavy use of null mounts and high concurrent path
lookup loads.

show more ...


# 8938f217 29-Apr-2022 Matthew Dillon <dillon@apollo.backplane.com>

kernel - vnode recycling, intermediate fix

* Fix a condition where vnlru (the vnode recycler) can live-
lock on unsuitable vnodes in the inactive list and stop
making progress, causing the syste

kernel - vnode recycling, intermediate fix

* Fix a condition where vnlru (the vnode recycler) can live-
lock on unsuitable vnodes in the inactive list and stop
making progress, causing the system to block.

First, don't deactivate vnodes which the inactive scan won't
recycle. Vnodes which are in the namecache topology but not
at a leaf won't be recycled by the vnlru thread. Leave these
vnodes on the active queue. This prevents the inactive queue
from filling up with vnodes that it can't recycle.

Second, the active scan in vnlru() will now call
cache_inval_vp_quick() to attempt to make a vnode presentable
so it can be deactivated. The inactive scan also does the same
thing, because some leakage can happen anyway.

* The active scan should be able to make continuous progress
as successful cache_inval_vp_quick() calls make more and more
vnodes presentable that might have previously been internal nodes
in the namecache topology. So the active scan should be able to
achieve the desired balance between the active and inactive queue.

* This should also improve performance when constant recycling
is happening by moving more of the work to the active->inactive
transition and doing less work in the inactive->free
transition

* Add cache_inval_vp_quick(), a function which attempts to trivially
disassociate a vnode from the namecache topology and will handle
any direct children if the vnode is not at a leaf (but not recursively
on its own). The definition of 'trivially' for the children are
namecache records that can be locked non-blocking, have no additional
refs, and do not record a vnode.

* Cleanup cache_unlink_parent(). Have cache_zap() use this
function instead of rerolling the same code. The cache_rename()
code winds up being slightly more complex. And now
cache_inval_vp_quick() can use the function too.

show more ...


# 83fe95c9 29-Apr-2022 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Temporary work-around for vnode recyclement problems

* vnlru deadlocks were encountered on grok while indexing ~20 million
files in deep directory trees.

* Add vfscache_unres accounting

kernel - Temporary work-around for vnode recyclement problems

* vnlru deadlocks were encountered on grok while indexing ~20 million
files in deep directory trees.

* Add vfscache_unres accounting to keep track of unresolved ncp's
at the leaves of the namecache tree. Start trimming the namecache
when the unres leaf count exceeds 1/16 maxvnodes, in addition to
the other algorithms.

* Add code in vnlru to decomission vnodes with children in the namecache
when those children are trivial (e.g. unresolved, dead, or negative
entries that can be easily locked).

show more ...


# 62560bbb 30-Jan-2022 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix namecache issue that can slow systems down over time

* Fix a serious dead-record issue with the namecache.

It is very common to create a blah.new file and then rename it over
an ex

kernel - Fix namecache issue that can slow systems down over time

* Fix a serious dead-record issue with the namecache.

It is very common to create a blah.new file and then rename it over
an existing file, say, blah.jpg, in order to atomically replace the
existing file. Such rename-over operations can cause up to
(2 * maxvnodes) dead dead namecache records to build up on a
single hash slot's list.

* Over time, this could result in over a million records on a single
hash slot's list which is often scanned during namecache lookups,
causing the kernel to turn into a sludge-pile.

This was not a memory leak per-say, the kernel still cleans excess
structures (above 2 * maxvnodes) up, but that just maintains the
status-quo and leaves the system in a slow, poorly-responsive state.

* Fixed by proactively deleting matching dead entries during namecache
lookups. The 'live' record is typically at the beginning of the list.
So to fix, the namecache lookup now scans the list for the hash slot
backwards and attempts to dispose of dead records.

show more ...


Revision tags: v6.2.1, v6.2.0, v6.3.0, v6.0.1
# aaf02314 08-Jun-2021 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Make sure nl_dvp is non-NULL in a few situations

* When NLC_REFDVP is set, nl_dvp should be returned non-NULL
when the nlookup succeeds.

However, there is one case where nlookup() can

kernel - Make sure nl_dvp is non-NULL in a few situations

* When NLC_REFDVP is set, nl_dvp should be returned non-NULL
when the nlookup succeeds.

However, there is one case where nlookup() can succeed but nl_dvp
can be NULL, and this is when the nlookup() represents a
mount-point.

* Fix three instances where this case was not being checked and
could lead to a NULL pointer dereference / kernel panic.

* Do the full resolve treatment for cache_resolve_dvp(). In
null-mount situations where we have A/B and we null-mount B onto C,
path resolutions of C via the null mount will resolve B but
not resolve A.

This breaks an assumption that nlookup() and cache_dvpref()
make about the parent ncp having a valid vnode. In fact, the
parent ncp of B (which is A) might not, because the resolve
path for B may have bypassed it due to the presence of the null
mount.

* Should fix occassional 'mkdir /var/cache' calls that fail with
EINVAL instead of EEXIST.

Reported-by: zach

show more ...


Revision tags: v6.0.0, v6.0.0rc1, v6.1.0
# e9dbfea1 21-Mar-2021 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add kmalloc_obj subsystem step 1

* Implement per-zone memory management to kmalloc() in the form of
kmalloc_obj() and friends. Currently the subsystem uses the same
malloc_type struct

kernel - Add kmalloc_obj subsystem step 1

* Implement per-zone memory management to kmalloc() in the form of
kmalloc_obj() and friends. Currently the subsystem uses the same
malloc_type structure but is otherwise distinct from the normal
kmalloc(), so to avoid programming mistakes the *_obj() subsystem
post-pends '_obj' to malloc_type pointers passed into it.

This mechanism will eventually replace objcache. This mechanism is
designed to greatly reduce fragmentation issues on systems with long
uptimes.

Eventually the feature will be better integrated and I will be able
to remove the _obj stuff.

* This is a object allocator, so the zone must be dedicated to one
type of object with a fixed size. All allocations out of the zone
are of the object.

The allocator is not quite type-stable yet, but will be once existential
locks are integrated into the freeing mechanism.

* Implement a mini-slab allocator for management. Since the zones are
single-object, similar to objcache, the fixed-size mini-slabs are a
lot easier to optimize and much simpler in construction than the
main kernel slab allocator.

Uses a per-zone/per-cpu active/alternate slab with an ultra-optimized
allocation path, and a per-zone partial/full/empty list.

Also has a globaldata-based per-cpu cache of free slabs. The mini-slab
allocator frees slabs back to the same cpu they were originally
allocated from in order to retain memory locality over time.

* Implement a passive cleanup poller. This currently polls kmalloc zones
very slowly looking for excess full slabs to release back to the global
slab cache or the system (if the global slab cache is full).

This code will ultimately also handle existential type-stable freeing.

* Fragmentation is greatly reduced due to the distinct zones. Slabs are
dedicated to the zone and do not share allocation space with other zones.
Also, when a zone is destroyed, all of its memory is cleanly disposed
of and there will be no left-over fragmentation.

* Initially use the new interface for the following. These zones
tend to or can become quite big:

vnodes
namecache (but not related strings)
hammer2 chains
hammer2 inodes
tmpfs nodes
tmpfs dirents (but not related strings)

show more ...


# bb8026fc 31-Jan-2021 Sascha Wildner <saw@online.de>

kernel: Fix two typos in comments.


# d6853f97 01-Nov-2020 Daniel Fojt <df@neosystem.org>

kernel: fix getcwd(3) return value for non-existing directory

In case current directory no longer exists, properly return ENOENT
from getcwd(), as described in manpage.

Issue: https://bugs.dragonfl

kernel: fix getcwd(3) return value for non-existing directory

In case current directory no longer exists, properly return ENOENT
from getcwd(), as described in manpage.

Issue: https://bugs.dragonflybsd.org/issues/3250

show more ...


Revision tags: v5.8.3, v5.8.2
# ad121268 27-Aug-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Deal with VOP_NRENAME races

* VOP_NRENAME() as implemented by the kernel can race any number of
ways, including deadlocking, allowing duplicate entries, and panicing
tmpfs. It typicall

kernel - Deal with VOP_NRENAME races

* VOP_NRENAME() as implemented by the kernel can race any number of
ways, including deadlocking, allowing duplicate entries, and panicing
tmpfs. It typically requires a heavy test load to replicate this but
a dsynth build triggered the issue at least once.

Other recently reported tmpfs issues with log file handling might also
be effected.

* A per-mount (semi-global) lock is now obtained whenever a directory
is renamed. This helps deal with numerous MP races that can cause
lock order reversals.

Loosely taken from netbsd and linux (mjg brought me up to speed on
this). Renaming directories is fraught with issues and this fix,
while somewhat brutish, is fine. Directories are very rarely renamed
at a high rate.

* kern_rename() now proactively locks all four elements of a rename
operation (source_dir, source_file, dest_dir, dest_file) instead of
only two.

* The new locking function, cache_lock4_tondlocked(), takes no chances
on lock order reversals and will use a (currently brute-force)
non-blocking and lock cycling algorithm. Probably needs some work.

* Fix a bug in cache_nlookup() related to reusing DESTROYED entries
in the hash table. This algorithm tried to reuse the entries while
maintaining shared locks, since only the entries need to be manipulate
to reuse them. However, this resulted in lookup races which could
cause duplicate entries. The duplicate entries then triggered
assertions in TMPFS.

* nlookup now tries a little harder and will retry if the parent of an
element is flagged DESTROYED after its lock was released. DESTROYED
elements are not necessarily temporary events as an operation can wind
up running in a deleted directory and must properly fail under those
conditions.

* Use krateprintf() to reduce debug output related to rename race
reporting.

* Revamp nfsrv_rename() as well (requires more testing).

* Allow nfs_namei() to be called in a loop for retry purposes if
desired. It now detects that the nd structure is initialized
from a prior run and won't try to re-parse the mbuf (needs testing).

Reported-by: zrj, mjg

show more ...


# 0df73a27 21-Aug-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Remove global cwd statistics counters

* These were only used for debugging purposes and interfere with
MP operation. Just remove them.


# 80d831e1 25-Jul-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Refactor in-kernel system call API to remove bcopy()

* Change the in-kernel system call prototype to take the
system call arguments as a separate pointer, and make the
contents read-onl

kernel - Refactor in-kernel system call API to remove bcopy()

* Change the in-kernel system call prototype to take the
system call arguments as a separate pointer, and make the
contents read-only.

int sy_call_t (void *);
int sy_call_t (struct sysmsg *sysmsg, const void *);

* System calls with 6 arguments or less no longer need to copy
the arguments from the trapframe to a holding structure. Instead,
we simply point into the trapframe.

The L1 cache footprint will be a bit smaller, but in simple tests
the results are not noticably faster... maybe 1ns or so
(roughly 1%).

show more ...


Revision tags: v5.8.1
# ac9f076f 24-Apr-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Remove CACHE_VREF: RACED debugging output

* Remove 'CACHE_VREF: RACED' debugging output after verifying
that the race can occur and is properly handled.


# b1999ea8 28-Mar-2020 Sascha Wildner <saw@online.de>

kernel: Remove <sys/n{amei,lookup}.h> from all files that don't need it.


# c065b635 04-Mar-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Rename spinlock counter trick API

* Rename the access side of the API from spin_update_*() to
spin_access_*() to avoid confusion.


# 4acc6b1f 04-Mar-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Refactor cache_vref() using counter trick

* Refactor cache_vref() such that it is able to validate that a vnode
(whos ref count might be 0) is not in VRECLAIM, without acquiring the
vno

kernel - Refactor cache_vref() using counter trick

* Refactor cache_vref() such that it is able to validate that a vnode
(whos ref count might be 0) is not in VRECLAIM, without acquiring the
vnode lock. This is the normal case.

If cache_vref() is unable to do this, it backs down to the old method
which was to get a vnode lock, validate that the vnode is not in
VRECLAIM, then release the lock.

* NOTE: In DragonFlyBSD, holding a vref on a vnode (vref, NOT vhold) will
prevent the vnode from transitioning to VRECLAIM.

* Use the new feature for nlookup's naccess() tests and for the *stat*()
series of system calls.

This significantly increases performance. However, we are not entirely
cache-contention free as both the namecache entry and the vnode are still
referenced, requiring atomic adds.

show more ...


# fc36a10b 03-Mar-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Normalize the vx_*() vnode interface

* The vx_*() vnode interface is used for initial allocations, reclaims,
and terminations.

Normalize all use cases to prevent the mixing together of

kernel - Normalize the vx_*() vnode interface

* The vx_*() vnode interface is used for initial allocations, reclaims,
and terminations.

Normalize all use cases to prevent the mixing together of the vx_*()
API and the vn_*() API. For example, vx_lock() should not be paired
with vn_unlock(), and so forth.

* Integrate an update-counter mechanism into the vx_*() API, assert
reasonability.

* Change vfs_cache.c to use an int update counter instead of a long.
The vfs_cache code can't quite use the spin-lock update counter API
yet.

Use proper atomics for load and store.

* Implement VOP_GETATTR_QUICK, meant to be a 'quick' version of
VOP_GETATTR() that only retrieves information related to permissions
and ownership. This will be fast-pathed in a later commit.

* Implement vx_downgrade() to convert an exclusive vx_lock into an
exclusive vn_lock (for vnodes). Adjust all use cases in the
getnewvnode() path.

* Remove unnecessary locks in tmpfs_getattr() and don't use
any in tmpfs_getattr_quick().

* Remove unnecessary locks in hammer2_vop_getattr() and don't use
any in hammer2_vop_getattr_quick()

show more ...


# 377c06c2 03-Mar-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Refactor vfs_cache 4/N

* Refactor cache_findmount() to operate conflict-free and cache line
bounce free for the most part. The counter trick is used to probe
cache entries and combined

kernel - Refactor vfs_cache 4/N

* Refactor cache_findmount() to operate conflict-free and cache line
bounce free for the most part. The counter trick is used to probe
cache entries and combined with a local pcpu spinlock to interlock
against unmounts.

The umount code is now a bit more expensive (it has to acquire all
pcpu umount spinlocks before cleaning the cache out).

This code is not ideal but it performs up to 6x better on multple cpus.

* Refactor _cache_mntref() to use a 4-way set association.

* Rewrite cache_copy() and cache_drop_and_cache()'s caching algorithm.

* Use cache_dvpref() from nlookup() instead of rolling the code twice.

* Rewrite the nlookup*() and cache_nlookup*() code to generally leave
namecache records unlocked throughout, removing one layer of shared
locks from cpu contention. Only the last element is locked.

* Refactor nlookup*()'s handling of absolute paths a bit more.

* Refactor nlookup*()'s handling of NLC_REFDVP to better-validate
that the parent directory is actually the parent directory.

This also necessitates a nlookupdata.nl_dvp check in various system
calls using NLC_REFDVP to detect the mount-point case and return
the proper error code (usually EINVAL, but e.g. mkdir would return
EEXIST).

* Clean up _cache_lock() and friends to restore the diagnostic messages
when a namecache lock stalls for too long.

* FIX: Fix bugs in nlookup*() retry code. The retry code was not properly
unwinding symlink path construction during the loop and also not properly
resetting the base directory when looping up. This primarily effects NFS.

* NOTE: Using iscsi_crc32() at the moment to get a good hash distribution.
This is obviously expensive, but at least it is per-cpu.

* NOTE: The cache_nlookup() nchpp cache still has a shared spin-lock
that will cache-line-bounce concurrent aquisitions.

show more ...


# 9460074f 29-Feb-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Improve cache_fullpath(), plus cleanup

* Improve cache_fullpath(). It can use a shared lock rather than an
exclusive lock, significantly improving concurrency. Important now
since rea

kernel - Improve cache_fullpath(), plus cleanup

* Improve cache_fullpath(). It can use a shared lock rather than an
exclusive lock, significantly improving concurrency. Important now
since realpath() indirectly uses this function.

* Code cleanup. Remove unused vfscache_rollup_all()

show more ...


Revision tags: v5.8.0
# 03081357 28-Feb-2020 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Refactor vfs_cache 3/N

* Leave the vnode held for each linked namecache entry, allowing us to
remove all the hold/drop code for 0->1 and 1->0 lock transitions of
ncps.

This significa

kernel - Refactor vfs_cache 3/N

* Leave the vnode held for each linked namecache entry, allowing us to
remove all the hold/drop code for 0->1 and 1->0 lock transitions of
ncps.

This significantly simplifies the cache_lock*() and cache_unlock()
functions.

* Adjust the vnode recycling code to check v_auxrefs against
v_namecache_count instead of against 0.

show more ...


123456789