hammer2_flush.c - OpenGrok history log for /dragonfly/sys/vfs/hammer2/hammer2

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# a13468b0	21-Dec-2023	Tomohiro Kusumi <tkusumi@netbsd.org>	sys/vfs/hammer2: Remove obsolete comments on chain statistics Remove comments for removed code from b3659de2a6ee73b51bf3edb4babfb4653134813f in 2014.
# 34fb48c2	20-Oct-2023	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - Multitude of SMP contention fixes, work on flush * Change the hammer2_io RBTREE to a hash table with per-entry locks. This reduces contention in the hammer2 block I/O subsystem which u hammer2 - Multitude of SMP contention fixes, work on flush * Change the hammer2_io RBTREE to a hash table with per-entry locks. This reduces contention in the hammer2 block I/O subsystem which used to be protected by a single lock. * Change the hammer2_inode RBTREE to a hash table with per-entry locks. This reduces contention in the hammer2 inode cache which used to be protected by a single lock. * Replace the hammer2_chain LRU cache with a per-inode cluster cache, which caches the last cluster-related chain. These caches are designed to hold a deep chain with 0 refs (and thus its parent recursion) to avoid having to reconstitute and recheck the chains on every VOP. For example when doing sequential I/O on a file. Probably needs more work. * Use the new trigger_syncer_start() and trigger_syncer_end() API to fix flush waits when the frontend is be asked to do large bulk modifying operations (such as file creation). The old code still worked but could sometimes cause processes to pause for up to 30 seconds when the flush wait raced the syncer. The flush wait wound up waiting for the next filesystem sync. show more ...
# 73da1719	19-Jul-2023	Tomohiro Kusumi <tkusumi@netbsd.org>	sys/vfs/hammer2: Use HAMMER2_VOLUME_BYTES for volume header size Both HAMMER2_PBUFSIZE and HAMMER2_VOLUME_BYTES are 64KiB, but HAMMER2_VOLUME_BYTES should be used for volume header size when explici sys/vfs/hammer2: Use HAMMER2_VOLUME_BYTES for volume header size Both HAMMER2_PBUFSIZE and HAMMER2_VOLUME_BYTES are 64KiB, but HAMMER2_VOLUME_BYTES should be used for volume header size when explicitly getting / reading a volume header block. It's been mix of these two. show more ...
# 523dfb54	25-Jan-2023	Tomohiro Kusumi <tkusumi@netbsd.org>	sys/vfs/hammer2: Remove -Wunused-but-set-variable local variables Warned on Linux user space.
Revision tags: v6.4.0, v6.4.0rc1, v6.5.0, v6.2.2, v6.2.1, v6.2.0, v6.3.0, v6.0.1
# b70cecb7	04-Jul-2021	Tomohiro Kusumi <tkusumi@netbsd.org>	sys/vfs/hammer2: HAMMER2_CHAIN_BMAP* should be HAMMER2_CHAIN_BLKMAP* The freemap code uses "bmap" and "BMAP" for bitmap, so these two macros should be "BLKMAP" as the comment implies.
# 55e28d18	21-Jun-2021	Tomohiro Kusumi <tkusumi@netbsd.org>	sys/vfs/hammer2: Remove if0'd HAMMER2_TRANS_PREFLUSH This no longer exists since 2085215738c03d949e60de63843cb91e84836eb9 in 2016.
# fb431ac4	21-Jun-2021	Tomohiro Kusumi <tkusumi@netbsd.org>	sys/vfs/hammer2: Remove unused FLUSH_DEBUG No longer used since 8138a154be31c3db1d8bd046ca7b003a6c79c01c in 2014.
# 556042ea	13-Jun-2021	Tomohiro Kusumi <tkusumi@netbsd.org>	sys/vfs/hammer2: Remove unused hammer2_flush_info::debug No longer used since 850d3f60f0a03e7a3f08357489acac749d6224ca in 2017. Remove the entire "hammer2_debug & 0x200" code around this as well.
# c12cfc4a	11-Jun-2021	Tomohiro Kusumi <tkusumi@netbsd.org>	sys/vfs/hammer2: Remove unused (add used) header includes
Revision tags: v6.0.0, v6.0.0rc1, v6.1.0
# 0b738157	25-Dec-2020	Tomohiro Kusumi <tkusumi@netbsd.org>	sys/vfs/hammer2: Add initial multi-volumes support for HAMMER2 This commit adds initial multi-volumes support for HAMMER2. Maximum supported volumes is 64. The feature and implementation is similar sys/vfs/hammer2: Add initial multi-volumes support for HAMMER2 This commit adds initial multi-volumes support for HAMMER2. Maximum supported volumes is 64. The feature and implementation is similar to multi-volumes support in HAMMER1. 1. ondisk changes ================= This commit bumps volume header version from 1 to 2, and adds four new volume header fields using reserved fields in version 1. Other ondisk structures are unchanged. * "volu_id" - volume id from 0 to 63, where 0 represents root volume. * "nvolumes" - number of volumes. All volumes have same the same value. * "total_size" - sum of "volu_size" in volumes. All volumes have the same value. * "volu_loff[HAMMER2_MAX_VOLUMES]" - A 512 bytes table which contains start offset of max 64 volumes within "total_size". All volumes have the same value. Version 1 volume header has 0 for above fields, so HAMMER2 internally treats "nvolumes" as 1, and "total_size" as "volu_size" to be able to handle version 1 and 2 transparently. All volumes have 4 headers, but only root volume ones are relevant. Non-root volume headers have their own unique "volu_id" and "volu_size", but other fields are unimportant and never used. Non-root volume headers have sroot blockset[i] whose type is HAMMER2_BREF_TYPE_INVALID. Non-root volume headers don't have boot/aux area, so freemap area start from offset 0. Non-root volume headers are readonly and never updated after creation. This means non-root volumes are just extra storage to extend fs size and internally make up a single virtual volume whose size is "total_size". It currently doesn't automatically upgrade an existing version 1 fs to version 2. Only newly created fs becomes version 2 for now. 2. volumes layout ================= Basically similar to HAMMER1. A first block device argument provided for newfs_hammer2(8) becomes the root volume, and if specified remaining devices extend "total_size" as non-root volumes. All volumes except for the last one have 1GiB (freemap level1) aligned "volu_size". This means each volume's start offset within "total_size" is also 1GiB (freemap level1) aligned. The start offsets of volumes are stored in volu_loff[HAMMER2_MAX_VOLUMES]. Each volu_loff[n] (0 <= n < nvolumes) represents start offset of volume n within "total_size". Unused volumes have -1 for volu_loff[n]. e.g. If a fs consists of 1 volume, volu_loff[0] has 0 and rests have -1. e.g. If a fs consists of 3 volumes, x GiB root volume, y GiB volume, and z GiB volume, volu_loff[0] has 0, volu_loff[1] has x, volu_loff[2] has x+y, and rests have -1. Low level I/O function in HAMMER2 uses this linear offsets table to determine a device vnode to use and relative offset within the device vnode, for a given blockref's "data_off". This is different from HAMMER1 where logical offset had embedded volume id bits (i.e. there were holes in logical address space). HAMMER2 needs this table to support multi- volumes without changing current logical offset mechanism. Unless all volumes are specified and mountable, mount_hammer2(8) fails like it failed in HAMMER1. This also applies to other userspace commands which require volumes specification, except for fstyp(8). 3. userspace commands ===================== Basically same as or similar to HAMMER1. * newfs_hammer2(8) takes a list of block device paths as argv[]. * mount_hammer2(8) takes block device paths or names in "a:b:c:..." format. * hammer2(8) takes block device paths or names in "a:b:c:..." format for directives which require volumes specification. This commit also adds "volume-list" directive and an ioctl command HAMMER2IOC_VOLUME_LIST, which are similar to the one in HAMMER1. * fsck_hammer2(8) takes device paths or names in "a:b:c:..." format. * fstyp(8) takes device paths in "path1:path2:path3:..." format. 4. limitations ============== * hammer2(8) "info" directive ignores multi-volumes block devices. * hammer2(8) "growfs" directive doesn't support multi-volumes fs. * fstyp(8) is unable to find PFS label via -l option if the PFS inode or its parent indirect blocks are located beyond root volume. * hammer2(8) doesn't support "volume-add" and "volume-del" directives which existed in HAMMER1, and there is currently no plan to support. show more ...
# 1eb19191	14-Oct-2020	Tomohiro Kusumi <tkusumi@netbsd.org>	sys/vfs/hammer2: Cleanup invalid blockref type switch cases
Revision tags: v5.8.3, v5.8.2
# a84aaf37	06-Sep-2020	Tomohiro Kusumi <tkusumi@netbsd.org>	sys/vfs/hammer2: Remove unused HAMMER2_CHAIN_EMBEDDED Appeared in 512beabd66d77a7cca8f73bf69030ffa90f0b9e3 in 2013, but no longer used, only appears in KASSERT.
# eb227e4d	01-Sep-2020	Tomohiro Kusumi <tkusumi@netbsd.org>	sys/vfs/hammer2: Remove Debugger() call for debugging in normal paths
Revision tags: v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3
# d0e99d5d	30-Jan-2020	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - Fix inode & chain limits, improve flush pipeline. * Reorganize VFS_MODIFYING() to avoid certain deadlock conditions and adjust hammer2 to unconditionally stall in VFS_MODIFYING() when di hammer2 - Fix inode & chain limits, improve flush pipeline. * Reorganize VFS_MODIFYING() to avoid certain deadlock conditions and adjust hammer2 to unconditionally stall in VFS_MODIFYING() when dirty limits are exceeded. Make sure VFS_MODIFYING() is called in all appropriate filesystem- modifying paths. This ensures that inode and chain structure allocation limits are adhered to. * Fix hammer2's wakeup code for the dirty inode count hystereis. This fixes a situation where stalls due to excessive dirty inodes were waiting a full second before resuming operation based on the dirty count hysteresis. The hysteresis now works as intended: (1) Trigger a sync when the dirty count reache 50% N. (2) Stall the frontend when the dirty count reaches 100% N. (3) Resume the frontend when the diirty count drops to 66% N. * Fix trigger_syncer() to guarantee that the syncer will flush the filesystem ASAP when called. If the filesystem is already in a flush, it will be flushed again. Previously if the filesystem was already in a flush it would wait one second before flushing again, which significantly reduces performance under conditions where the dirty chain limit or the dirty inode limit is constantly being hit (e.g. chown -R, etc). Reported-by: tuxillo show more ...
Revision tags: v5.6.2
# 628176c9	04-Aug-2019	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer2: Drop obsolete comments on chain "dbtree", "dbq", "core_entry", "domodify" are all removed in around 2014.
Revision tags: v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2, v5.4.1
# 5071e670	06-Dec-2018	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - stabilization * Adjust the chain lock a bit to bump lockcnt prior to acquiring a non-blocking shared lock, instead of afterwords, which cleans up a live-loop case in the chain unlock c hammer2 - stabilization * Adjust the chain lock a bit to bump lockcnt prior to acquiring a non-blocking shared lock, instead of afterwords, which cleans up a live-loop case in the chain unlock code. * Cleanup misc debugging. Add some inode debugging code (default disabled). * Add some crash-dump (or live) debug utilities for tracking down chains, dio's, and inodes. kgdb's macros are too slow for iterating a million chains. show more ...
# d2a41023	05-Dec-2018	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - refactor filesystem sync 7/N * Increase default caps for dirty chain and dirty inode counts. The new SYNCQ semantics allow this number to be arbitrarily large, but it is still a good hammer2 - refactor filesystem sync 7/N * Increase default caps for dirty chain and dirty inode counts. The new SYNCQ semantics allow this number to be arbitrarily large, but it is still a good idea not to allow it to get out of control. NOTE: One advantage of higher caps is that it gives the frontend more time to delete temporary files. * Get rid of the old syncer speedup / write moderation mechanisms. Replace with a new VFS_MODIFYING() hook that allows the filesystem to implement moderation prior to any vnode locks being held. Remove hammer2_pfs_memory_wait() calls from VOP bodies, implement the hammer2_pfs_memory_wait() call via VFS_MODIFYING() instead. * Move the moderation wakeup for the inode count to the syncer, and change the parameter to use pmp->sideq_count. show more ...
# d0755e6d	02-Dec-2018	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - refactor filesystem sync 6/N * Dependency tracking. Add modest cross-dependency grouping. This code does not track dependencies in a graph. Instead, it simply groups dependent inode hammer2 - refactor filesystem sync 6/N * Dependency tracking. Add modest cross-dependency grouping. This code does not track dependencies in a graph. Instead, it simply groups dependent inodes together. This means that dependency groups can get rather large when, for example, lots of files are being created or deleted in the same directory. * We retain the excellent dynamic inode reordering code for the syncq. When the frontend blocks on an inode that is in the syncq, the inode will be reordered to the front of the queue to reduce the frontend stall time as much as possible. * Remove the COPYQ transaction flag and related sequencing. * Fix flush sequencing for pmp->iroot. We must flush iroot's chains with HAMMER2_XOP_FSSYNC last. When iroot is dirty, the out-of-order flush of iroot that occurs before the final stage must be run without FSSYNC set, otherwise iroot's pmp->pfs_iroot_blocksets[] will not be consistent because the remaining inodes in the syncq haven't been flushed yet. * Fix a broken syncer speedup conditional. show more ...
Revision tags: v5.4.0, v5.5.0, v5.4.0rc1
# 6f445d15	13-Nov-2018	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - refactor filesystem sync 4/N * Save synchronized iroot blockmaps for snapshot code, and use them in the snapshot code. * Improve dependency handling and syncq/sideq flagging for depen hammer2 - refactor filesystem sync 4/N * Save synchronized iroot blockmaps for snapshot code, and use them in the snapshot code. * Improve dependency handling and syncq/sideq flagging for dependencies. Also improve the hammer2_inode_t reordering code that allows the frontend to continue operating on dirty inodes simultaneous with a filesystem sync. * Move inode deletion into the filesystem sync code (in addition to creation), for the same reason. * Fix lost ref counts in the snapshot code which were causing umount panics. * Stabilization pass on volume flush code. Since flushes stop at inode boundaries, we must properly flush the superroot before flushing the volume header. That is, the flush sequence is: - flush inodes for PFS (flushes inode content) - flush PFS root inode (flushes through to inodes) - flush superroot inode (flushes through to PFS root) - flush volume header (flushes voulume header to superroot) Theoretically this allows the filesystem asynchronously write data and inodes flushed by the kernel's buffer cache and vnode code concurrent with a filesystem flush without messing up filesystem consistency, because these asynchronously flushed inodes are not included (or have already been flushed) in the filesystem flush that is already underway. * Filesystem consistency still not perfect (using snapshot-debug directive to test during heavy filesystem modification loads, directory entries are sometimes desynchronized from their inodes). show more ...
# 3e8408db	10-Nov-2018	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - refactor filesystem sync 3/N * Attempt to guarantee filesystem consistency at all sync points. * Pretty severely hacked, and at the moment this can result in syncs which never end if th hammer2 - refactor filesystem sync 3/N * Attempt to guarantee filesystem consistency at all sync points. * Pretty severely hacked, and at the moment this can result in syncs which never end if the filesystem is busy. show more ...
# ecfe89b8	09-Nov-2018	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - refactor filesystem sync 2/N * Flesh out the flush partitioning code, fixing a number of issues. * Refactor hammer2_inode_lock() and add hammer2_inode_lock4() to interlock against flush hammer2 - refactor filesystem sync 2/N * Flesh out the flush partitioning code, fixing a number of issues. * Refactor hammer2_inode_lock() and add hammer2_inode_lock4() to interlock against flushes. This is handled by blocking inode locks against SYNCQ, and reordering the inode to the front of the SYNCQ list in order to unblock as quickly as possible as the filesystem sync progresses. The result should be relatively few frontend stalls during a filesystem sync. * Disable resource caps for the moment, because synchronous operations to prevent resource limits from blowing out break the current inode_lock() code and allow vnode deadlocks to occur. To avoid deadlocks, the filesystem sync currently must clear SYNCQ before locking the inode & vnode, and if it cannot lock a vnode it must continue on with the next inode and then restart. Retried vnodes introduce a short delay to give the frontend time to work the blocking operation. This is necessary because the kernel locks vnodes before entering the H2 frontend, and we cannot safely unlock/relock them to work around this. Nor do we necessarily even have full knowledge on which vnodes the current thread has locked. * Does not yet guarantee complete filesystem consistency on-crash. show more ...
# 5afbe9d8	05-Nov-2018	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - refactor filesystem sync 1/N * Change H2 to allow concurrent filesystem sync and modifying frontend operations. * FLUSH transactions no longer block modifying frontend transactions. hammer2 - refactor filesystem sync 1/N * Change H2 to allow concurrent filesystem sync and modifying frontend operations. * FLUSH transactions no longer block modifying frontend transactions. * Change filesystem sync operation to put all modified inodes on the pmp->syncq (which is also combined with any inodes on pmp->sideq), and then iterating the syncq to flush each inode. After this is done, stage 2 will flush the meta-data tree leading to each inode. This code will also handle delayed inode creation and destruction ops, which require modifications to the meta-data tree governing the inodes themselves (so we don't want the frontend to do it and interfere with the flush). * Modifying operations against inodes already queued for a filesystem sync that is in progress will now reorder the inode to the front of the filesystem sync in progress and wait for the sync on that inode to complete before proceeding. This is handled by blocking in the exclusive inode lock code. * hammer2_inode_get() does not need to pass 'dip' any more because regular inodes are inserted under the iroot (PFS root inode) and no longer inserted hierarchically. * Separate out hammer2_inode_create() into hammer2_inode_create_pfs() and hammer2_inode_create_normal(). These two forms are now distinct enough that the code is a mess if we try to leave them combined. show more ...
# c4421f07	29-Jul-2018	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - Remote xop implementation part 1 * Normalize naming conventions for XOP functions. * Change the XOP callback API to remove the hammer2_thread argument. Pass the clindex and scratch buff hammer2 - Remote xop implementation part 1 * Normalize naming conventions for XOP functions. * Change the XOP callback API to remove the hammer2_thread argument. Pass the clindex and scratch buffer in directly. * Change the XOP API to pass in a function descriptor instead of a function pointer, create prototypes for DMSG send/receive XOPs which will be used for XOP components which are DMSG based and not local-storage based. * Adjust comments. show more ...
Revision tags: v5.2.2
# 257c2728	24-May-2018	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - Fix kmalloc pool blowout on low-memory machines * Fix a kmalloc pool blown that can occur on low-memory machines due to too many disconnected hammer2_inode structures building up. * Was hammer2 - Fix kmalloc pool blowout on low-memory machines * Fix a kmalloc pool blown that can occur on low-memory machines due to too many disconnected hammer2_inode structures building up. * Was previously fixed for things like rm -rf and bulk renames, but not for setattr (aka chown/chmod -R ops). Reported-by: gjs278 show more ...
Revision tags: v5.2.1, v5.2.0, v5.3.0, v5.2.0rc
# 65c894ff	17-Mar-2018	Matthew Dillon <dillon@apollo.backplane.com>	hammer - Fix bugs, fix serious snapshot bug, flush adjustments * Make sure we only flush the volume header for a general sync request and not for a fsync() on /. * Fix more lock order reversals w hammer - Fix bugs, fix serious snapshot bug, flush adjustments * Make sure we only flush the volume header for a general sync request and not for a fsync() on /. * Fix more lock order reversals when translating directory entries to inodes. * Separate out spmp elements into their own list to make umount ordering easier. * Flush in three stages. (1) flush dirty filesystem inodes (2) flush PFS meta-data topology up to the filesystem inodes. (3) flush the volume root and its meta-data up to the PFS inodes. This is staging for later sync concurrency improvements. * Fix a bug where creating enough snapshots (more than 4 total PFSs) causes some PFSs to lose an important flag in their blockref, which causes flushes to stop working properly on that PFS. show more ...
12 3 4 5 6