hammer2_synchro.c - OpenGrok history log for /dragonfly/sys/vfs/hammer2/hammer2

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# 022bb0a9	21-Feb-2022	Tomohiro Kusumi <tkusumi@netbsd.org>	sys/vfs/hammer2: Mostly trailing whitespace cleanups
Revision tags: v6.2.1, v6.2.0, v6.3.0, v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3, v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1
# ecfe89b8	09-Nov-2018	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - refactor filesystem sync 2/N * Flesh out the flush partitioning code, fixing a number of issues. * Refactor hammer2_inode_lock() and add hammer2_inode_lock4() to interlock against flush hammer2 - refactor filesystem sync 2/N * Flesh out the flush partitioning code, fixing a number of issues. * Refactor hammer2_inode_lock() and add hammer2_inode_lock4() to interlock against flushes. This is handled by blocking inode locks against SYNCQ, and reordering the inode to the front of the SYNCQ list in order to unblock as quickly as possible as the filesystem sync progresses. The result should be relatively few frontend stalls during a filesystem sync. * Disable resource caps for the moment, because synchronous operations to prevent resource limits from blowing out break the current inode_lock() code and allow vnode deadlocks to occur. To avoid deadlocks, the filesystem sync currently must clear SYNCQ before locking the inode & vnode, and if it cannot lock a vnode it must continue on with the next inode and then restart. Retried vnodes introduce a short delay to give the frontend time to work the blocking operation. This is necessary because the kernel locks vnodes before entering the H2 frontend, and we cannot safely unlock/relock them to work around this. Nor do we necessarily even have full knowledge on which vnodes the current thread has locked. * Does not yet guarantee complete filesystem consistency on-crash. show more ...
# 5afbe9d8	05-Nov-2018	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - refactor filesystem sync 1/N * Change H2 to allow concurrent filesystem sync and modifying frontend operations. * FLUSH transactions no longer block modifying frontend transactions. hammer2 - refactor filesystem sync 1/N * Change H2 to allow concurrent filesystem sync and modifying frontend operations. * FLUSH transactions no longer block modifying frontend transactions. * Change filesystem sync operation to put all modified inodes on the pmp->syncq (which is also combined with any inodes on pmp->sideq), and then iterating the syncq to flush each inode. After this is done, stage 2 will flush the meta-data tree leading to each inode. This code will also handle delayed inode creation and destruction ops, which require modifications to the meta-data tree governing the inodes themselves (so we don't want the frontend to do it and interfere with the flush). * Modifying operations against inodes already queued for a filesystem sync that is in progress will now reorder the inode to the front of the filesystem sync in progress and wait for the sync on that inode to complete before proceeding. This is handled by blocking in the exclusive inode lock code. * hammer2_inode_get() does not need to pass 'dip' any more because regular inodes are inserted under the iroot (PFS root inode) and no longer inserted hierarchically. * Separate out hammer2_inode_create() into hammer2_inode_create_pfs() and hammer2_inode_create_normal(). These two forms are now distinct enough that the code is a mess if we try to leave them combined. show more ...
# c4421f07	29-Jul-2018	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - Remote xop implementation part 1 * Normalize naming conventions for XOP functions. * Change the XOP callback API to remove the hammer2_thread argument. Pass the clindex and scratch buff hammer2 - Remote xop implementation part 1 * Normalize naming conventions for XOP functions. * Change the XOP callback API to remove the hammer2_thread argument. Pass the clindex and scratch buffer in directly. * Change the XOP API to pass in a function descriptor instead of a function pointer, create prototypes for DMSG send/receive XOPs which will be used for XOP components which are DMSG based and not local-storage based. * Adjust comments. show more ...
Revision tags: v5.2.2
# fda30e02	25-May-2018	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - Fix focus vs modify race * Fixes rare panics when e.g. removing large numbers (as in hundreds of millions) of directory entries. The XOP collection code holds the collected cluster bu hammer2 - Fix focus vs modify race * Fixes rare panics when e.g. removing large numbers (as in hundreds of millions) of directory entries. The XOP collection code holds the collected cluster but cannot safely lock it without risking a deadlock against backend operations or dead backends. The hold on the chain prevents its destruction, but does not prevent another thread from locking it and issuing hammer2_chain_modify(). * Fix bugs due to the unsafe nature of an unlocked chain's content, especially chain->data and chain->dio, by adding an interlock between frontend access to the data and backend hammer2_chain_modify() calls. Held but unlocked chains are used by the XOP API to pass chains back to the frontend. * Remove the automatic (because it is unsafe) dio synchronization in hammer2_xop_collect() and instead implement an API that the frontend can use to safely access the data. The API is hammer2_xop_gdata()/hammer2_xop_pdata(). * Remove the unsafe hammer2_cluster_rdata(). Use gdata/pdata for this too. * Rewire hammer2_inode_get() to pass-in a hammer2_xop_head instead of a hammer2_cluster so it can use the gdata/pdata API too. show more ...
# 257c2728	24-May-2018	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - Fix kmalloc pool blowout on low-memory machines * Fix a kmalloc pool blown that can occur on low-memory machines due to too many disconnected hammer2_inode structures building up. * Was hammer2 - Fix kmalloc pool blowout on low-memory machines * Fix a kmalloc pool blown that can occur on low-memory machines due to too many disconnected hammer2_inode structures building up. * Was previously fixed for things like rm -rf and bulk renames, but not for setattr (aka chown/chmod -R ops). Reported-by: gjs278 show more ...
Revision tags: v5.2.1, v5.2.0, v5.3.0, v5.2.0rc
# 68b321c1	16-Mar-2018	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - More involved refactoring of chain_repparent, cleanup * Remove unused locking flags (remove the NOLOCK and NOUNLOCK features). * Add HAMMER2_RESOLVE_NONBLOCK to hammer2_chain_lock() for hammer2 - More involved refactoring of chain_repparent, cleanup * Remove unused locking flags (remove the NOLOCK and NOUNLOCK features). * Add HAMMER2_RESOLVE_NONBLOCK to hammer2_chain_lock() for use only by hammer2_chain_getparent() and hammer2_chain_repparent(). * Refactor hammer2_chain_getparent() and hammer2_chain_repparent(). Add a hot-path that uses HAMMER2_RESOLVE_NONBLOCK. If this fails we now do a much more involved tracking operation via 'reptrack' to deal with races against indirect block deletions. * Cleanup the copyright messages. * Fix an issue where a sync could be held-up indefinitely by ongoing overlapping modifying operations. * Install a proper initial inode count when creating a snapshot. * Fix a deadlock in checkdirempty(). A chain lock was winding up being ordered incorrectly. show more ...
Revision tags: v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1
# 65cacacf	07-Sep-2017	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - Implement error processing and free reserve enforcement * newfs_hammer2 calculates the correct amount of reserved space. We have to reserve 4MB per 1GB, not 4MB per 2GB, due to a snafu. hammer2 - Implement error processing and free reserve enforcement * newfs_hammer2 calculates the correct amount of reserved space. We have to reserve 4MB per 1GB, not 4MB per 2GB, due to a snafu. This is still only 0.4% of the storage. * Flesh out HAMMER2_ERROR_* codes and make most hammer2 functions return a proper error code. * Add error handling to nearly all code that can dirty a chain, in particular to handle ENOSPC issues. Any dirty buffers that cannot be flushed will incur a write error (which in DragonFly typically causes the buffer to be retries later). Any dirty chain that cannot be flushed will remain in the topology and can be completed in a later flush if space has been freed up. We try to avoid allowing the filesystem to get into this situation in the first place, but if it does, it should be possible to flush these asynchronous modifying chains and buffers once space is freed up via bulkfree. * Relax class match requirements in the freemap allocator when the freemap gets close to full. This will allow e.g. inodes to be allocated out of DATA bitmaps and vise versa, and so forth. This fixes edge conditions where there is enough free space available but it has all been earmarked for the wrong data class. * Try to fix a bug in live_count tracking when destroying an indirect block chain or inode chain that has not yet been blockmapped due to a drop. This situation only occurs when chains cannot be flushed due to I/O errors or disk full conditions, and are then later destroyed (e.g. such as when the governing file is removed). This should fix a live_count assertion that can occur under these circumstances. See hammer2_chain_lastdrop(). * Enforce the free reserve requirement for all modifying VOP calls. Root users can nominally fill the file system to 97.5%, non-root users to 95%. At 90%, write()s will enforce bawrite() verses bdwrite() to try to avoid buffer cache flushes from actually running the filesystem out of space. This is needed because we do not actually know how much disk space is going to be needed at write() time. Deduplication and compression occurs later, at buffer-flush time. * Do NOT flush the volume header when a vfs sync is unable to completely flush a device due to errors. This ensures that the underlying media does not become corrupt. * Fix an issue where bref.check.freemap.bigmask was not being properly reset to -1 when bulkfree is able to free an element. This bug prevented the allocator from recognizing that free space was available in that bitmap. * Modify bulkfree operation to use the live topology when flushing and snapshot operations fail due to errors, allowing bulkfree to run. * Nominal bulkfree operations now runs on the snapshot without a transaction (more testing is needed). This theoretically should allow bulkfree to run concurrent with just about any operation including flushes. * Add a freespace tracking heuristic to reduce the overhead that modifying VOP calls incur in checking the free reserve requirement. * hammer2 show dumps additional info for freemap nodes. show more ...
# c8c0a18a	31-Aug-2017	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - error handling 2/N (chain_lookup/chain_next) * Implement error handling for hammer2_chain_lookup() and hammer2_chain_next(). * Shim use cases for this commit. Ultimately the intent is hammer2 - error handling 2/N (chain_lookup/chain_next) * Implement error handling for hammer2_chain_lookup() and hammer2_chain_next(). * Shim use cases for this commit. Ultimately the intent is to convert the entire error path to HAMMER2_ERROR_* codes. show more ...
# 2b8f3e7e	31-Aug-2017	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - Embed cache_index heuristic in chain structure * Embed the cache_index heuristic in the hammer2_chain structure and get rid of all the code that passed it in to various API functions. T hammer2 - Embed cache_index heuristic in chain structure * Embed the cache_index heuristic in the hammer2_chain structure and get rid of all the code that passed it in to various API functions. This substantially cleans-up the API. * Adjust comments for upcoming error handling work. show more ...
# 59eb0066	19-Aug-2017	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - Fix snapshots and multi-label mounts * Allow the same /dev/blah@DIFFERENTLABEL to be specified in a mount command, so multiple labels from the same device can be mounted. * Devfs can th hammer2 - Fix snapshots and multi-label mounts * Allow the same /dev/blah@DIFFERENTLABEL to be specified in a mount command, so multiple labels from the same device can be mounted. * Devfs can throw different vnodes for the same device. When matching up hammer2_dev, check devvp->v_rdev for a match as well. * Fix a number of bugs in the snapshot code that left a hammer2_inode structure hanging and caused a panic. * Fix races in admin thread flag messaging that could lead to 60-second delays during umount. show more ...
# 9dca9515	18-Aug-2017	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - Add kernel-thread-based async bulk free * Add an async bulk-free feature and a kernel thread to allow the hammer2 vfs itself to run bulk frees. * Revamp the hammer2 thread management co hammer2 - Add kernel-thread-based async bulk free * Add an async bulk-free feature and a kernel thread to allow the hammer2 vfs itself to run bulk frees. * Revamp the hammer2 thread management code a bit to support the new use, and clean-up the API. show more ...
Revision tags: v4.8.1
# da0cdd33	25-Jul-2017	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - Initial HARDLINK -> DIRENT replacement code * Initial removal of the vestiges of the old embedded inode code. Inodes were moved to the root directory long ago but directories still cont hammer2 - Initial HARDLINK -> DIRENT replacement code * Initial removal of the vestiges of the old embedded inode code. Inodes were moved to the root directory long ago but directories still contain dummy OBJTYPE_HARDLINK inodes instead of real directory entries to point to the moved inodes. These inodes ate 1024 bytes of disk space for each directory entry. * Remove the dummy OBJTYPE_HARDLINK inodes and replace with new BREF_TYPE_DIRENT blockrefs. These blockrefs represent directory entries, and the entire dirent will fit in the blockref (requiring no data ref) if the filename is <= 64 bytes. * This new DIRENT mechanic significantly improves performance and reduces storage overage vs the previous mechanicn, for obvious reasons. Directory entries are now 128 bytes instead of 1024 bytes, and since they are collected together in indirect blocks or (if <= 4 entries) simply placed in the 4 blockrefs embedded in the directory inode, the related I/O tends to be fairly optimal. Only directory entries whos filenames are > 64 bytes long require an additional data block reference. For now, due to other constraints, we use the minimum H2 allocation size of 1KB for these, so certainly space is wasted. But in real life there aren't actually a whole lot of filenames that are that long so it should be fine. show more ...
Revision tags: v4.8.0, v4.6.2
# 0d66a712	17-Mar-2017	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - Fix cluster synchronization bug (2+ nodes) * Fix a bug where hammer2_cluster_check() can end up in an infinite loop. * Look for pfs_names[] in all PFSs associated with a cluster. * Add hammer2 - Fix cluster synchronization bug (2+ nodes) * Fix a bug where hammer2_cluster_check() can end up in an infinite loop. * Look for pfs_names[] in all PFSs associated with a cluster. * Add missing xop retirement in pfs-delete. * Skip the directory empty check in pfs-delete. * Preliminary code to deallocate an element of a live PFS show more ...
Revision tags: v4.9.0, v4.8.0rc, v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0
# 7fece146	09-Jul-2016	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - Add feature to allow sector overwrite, fix meta-data check code * If a file is set to use no check code (hammer2 setcheck none <file>), data overwrites will reuse the same sector as long hammer2 - Add feature to allow sector overwrite, fix meta-data check code * If a file is set to use no check code (hammer2 setcheck none <file>), data overwrites will reuse the same sector as long as it does not violate the most recent snapshot. This allows the program to relax copy-on-write requirements for certain files, for example files which might be mmap()'d SHARED+RW and then modified constantly where the programmer has determined that the possibility of corruption is ok. * Implement pfs_lsnap_tid in the PFS root inode meta-data. This records the last snapshot TID so the chain code can determine if an overwrite is allowed. * Remove attr_tid and dirent_tid from the inode meta-data for now. * Only BREF_TYPE_DATA brefs inherit the inode check mode. Meta-data brefs such as indirect blocks, or directory entries, will only use the check code type specified in the parent inode if it is not NONE. Otherwise they will use the default check code. This fixes a bug where meta-data brefs could wind up being unchecked. We want all meta-data to always be checked (at least for now). show more ...
# 660d007e	09-Jun-2016	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - Revamp worker thread signaling * Revamp how worker thread signaling works. Get rid of a number of race conditions and use atomic ops. We no longer need thr->lk. * Make hammer2_cluster hammer2 - Revamp worker thread signaling * Revamp how worker thread signaling works. Get rid of a number of race conditions and use atomic ops. We no longer need thr->lk. * Make hammer2_cluster_enable's scaling factor work with cluster_write() as well as cluster_read(). show more ...
Revision tags: v4.4.3, v4.4.2, v4.4.1, v4.4.0, v4.5.0, v4.4.0rc
# b02c0ae6	19-Nov-2015	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - stabilization pass on slave sync (2) * Augment xop_scanall to allow flags to be passed in. * Implement HMNT2_LOCAL (-o local) flag for debugging cluster elements. * Fix missing data pani hammer2 - stabilization pass on slave sync (2) * Augment xop_scanall to allow flags to be passed in. * Implement HMNT2_LOCAL (-o local) flag for debugging cluster elements. * Fix missing data panic by resolving chain data in the slave sync scan. * Fix PFS installation ioctl to add the new PFS to the cluster as appropriate. * Numerous cleanups and fixes to the slave sync code which was previously ripped up by the XOPs work. Note that the slave sync code still has tons of issues and races. show more ...
# 3cbe226b	18-Nov-2015	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - stabilization pass on slave sync * Add a temporary hack which avoids a NULL pointer panic. This goes a long way to stabilizing the backend slave sync threads.
# 3f01ebaa	30-Aug-2015	Matthew Dillon <dillon@apollo.backplane.com>	hammer2 - live dedup, cleanup * First attempt at a live dedup. The H2 strategy code now caches {data_off, crc} info to track recently accessed data blocks. The cache is checked in the strategy hammer2 - live dedup, cleanup * First attempt at a live dedup. The H2 strategy code now caches {data_off, crc} info to track recently accessed data blocks. The cache is checked in the strategy_write code after device-level block encoding. If we get a cache hit, the disk block is compared against the write data and reused if it matches. * This 'live' dedup should catch most typical 'cp' or 'cpdup' style commands. There will also be a bulk dedup capable of catching everything. * Note that 'df' output might be a bit confusing because the 'Used' field represents the topology and does not take into account dedups. 'Avail' is calculated from the actual freemap. To make things look right the total disk size is adjusted upward so it matches Used+Avail. This mechanism will likely change. Here is an example with one copy of /usr/src and 13 copies of /usr/src. The first copy eats around 872MB, and a 'du' will show each copy eating about the same. But because of dedup each subsequent copy actually only eats around 160MB as you can see from the 'Avail' field: test40# df -h /mnt Filesystem Size Used Avail Capacity /dev/serno/WD-WX51A82J2299.s1f@LOCAL 99G 934M 99G 1% Filesystem Size Used Avail Capacity /dev/serno/WD-WX51A82J2299.s1f@LOCAL 106G 8.5G 97G 8% * Rename hammer2_bulkscan.c to hammer2_bulkfree.c since that is basically all it does. * Move the synchronization code to its own file, hammer2_synchro.c. (note: This code is currently in rip-up mode and will not operate properly). show more ...