hammer_cursor.h - OpenGrok history log for /dragonfly/sys/vfs/hammer/hammer

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: v6.2.1, v6.2.0, v6.3.0, v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3, v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1, v5.2.2, v5.2.1, v5.2.0, v5.3.0, v5.2.0rc, v5.0.2, v5.0.1, v5.0.0, v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1, v4.8.0, v4.6.2, v4.9.0, v4.8.0rc, v4.6.1
# 22a0040d	28-Aug-2016	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Typedef struct on declaration Most structs in hammer are typedef'd. Some structs define it on declaration, but others don't.
# 562d34c2	28-Aug-2016	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Use typedef'd for struct hammer_buffer* The whole hammer code is mix of using struct and typedef'd. Use typedef'd because majority of the code use typedef'd.
# 877580d2	28-Aug-2016	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Use typedef'd for struct hammer_record* The whole hammer code is mix of using struct and typedef'd. Use typedef'd because majority of the code use typedef'd.
# e1067862	28-Aug-2016	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Use typedef'd for struct hammer_inode* The whole hammer code is mix of using struct and typedef'd. Use typedef'd because majority of the code use typedef'd.
# 053f997d	28-Aug-2016	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Use typedef'd for struct hammer_btree_leaf_elm* The whole hammer code is mix of using struct and typedef'd. Use typedef'd because majority of the code use typedef'd.
# 513f50d5	28-Aug-2016	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Use typedef'd for union hammer_data_ondisk* The whole hammer code is mix of using struct and typedef'd. Use typedef'd because majority of the code use typedef'd.
Revision tags: v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0, v4.4.3
# cf977f11	01-Mar-2016	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Don't use HAMMER_CURSOR_GET_LEAF HAMMER_CURSOR_GET_LEAF is not used by the current implementation of hammer_btree_extract(), since this function will cause cursor->leaf to point to t sys/vfs/hammer: Don't use HAMMER_CURSOR_GET_LEAF HAMMER_CURSOR_GET_LEAF is not used by the current implementation of hammer_btree_extract(), since this function will cause cursor->leaf to point to the node element in question regardless of the flag value. Then just don't use this flag when calling hammer_btree_extract() when we know loading node element (leaf) is the default behavior. The cursor flag is already complicated enough, so simplifying btree extract callers by either passing 0 or DATA but not LEAF makes things a bit more clear. show more ...
Revision tags: v4.4.2, v4.4.1, v4.4.0
# 60ef395a	24-Nov-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Remove forward declaration of struct hammer_cmirror
Revision tags: v4.5.0, v4.4.0rc
# 964cb30d	05-Sep-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Add ifndef/define/endif for headers Some headers are missing this, so add it to those.
# f176517c	28-Aug-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	hammer: Remove cluster topology related comments that were written in the early stage of hammer devel but do not reflect the actual implementation today, such as super-cluster, etc.
Revision tags: v4.2.4, v4.3.1, v4.2.3
# 745703c7	07-Jul-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	hammer: Remove trailing whitespaces - (Non-functional commits could make it difficult to git-blame the history if there are too many of those)
Revision tags: v4.2.1, v4.2.0, v4.0.6, v4.3.0, v4.2.0rc, v4.0.5, v4.0.4, v4.0.3, v4.0.2, v4.0.1, v4.0.0, v4.0.0rc3, v4.0.0rc2, v4.0.0rc, v4.1.0, v3.8.2, v3.8.1, v3.6.3, v3.8.0, v3.8.0rc2, v3.9.0, v3.8.0rc, v3.6.2, v3.6.1, v3.6.0, v3.7.1, v3.6.0rc, v3.7.0, v3.4.3, v3.4.2, v3.4.0, v3.4.1, v3.4.0rc, v3.5.0, v3.2.2, v3.2.1, v3.2.0, v3.3.0, v3.0.3, v3.0.2, v3.0.1, v3.1.0, v3.0.0
# 86d7f5d3	26-Nov-2011	John Marino <draco@marino.st>	Initial import of binutils 2.22 on the new vendor branch Future versions of binutils will also reside on this branch rather than continuing to create new binutils branches for each new version.
Revision tags: v2.12.0, v2.13.0, v2.10.1, v2.11.0, v2.10.0, v2.9.1, v2.8.2, v2.8.1, v2.8.0, v2.9.0, v2.6.3, v2.7.3, v2.6.2, v2.7.2, v2.7.1, v2.6.1, v2.7.0, v2.6.0, v2.5.1, v2.4.1, v2.5.0, v2.4.0, v2.3.2, v2.3.1
# 3214ade6	06-May-2009	Matthew Dillon <dillon@apollo.backplane.com>	HAMMER VFS - Refactor merged search function to try to avoid missed entries Refactor the merged B-Tree + In-Memory search function to try to avoid races where an in-memory record is flushed to the m HAMMER VFS - Refactor merged search function to try to avoid missed entries Refactor the merged B-Tree + In-Memory search function to try to avoid races where an in-memory record is flushed to the media during a search, causing the search to miss the record. Add another flag to hammer_record_t to indicate that the record was deleted because it was committed to the media (verses simply being deleted). Better-separate HAMMER_RECF_DELETED_FE and HAMMER_RECF_DELETED_BE. These flags indicate whether the frontend or backend deleted an in-memory record. The backend ignores frontend deletions that occur after the record has been associated with a flush group. Remove some console Warnings that are no longer applicable. show more ...
Revision tags: v2.2.1, v2.2.0, v2.3.0, v2.1.1, v2.0.1
# 5c8d05e2	06-Aug-2008	Matthew Dillon <dillon@dragonflybsd.org>	HAMMER 2.1:01 - Stability * Fix a bug in the B-Tree code. Recursive deletions are done prior to marking a node as actually being empty, but setup for the deletion (by calling hammer_cursor_dele HAMMER 2.1:01 - Stability * Fix a bug in the B-Tree code. Recursive deletions are done prior to marking a node as actually being empty, but setup for the deletion (by calling hammer_cursor_deleted_element()) must still occur prior to the recursrion so cursor indexes are properly adjusted for the possible removal. If the recursion is not successful we can just leave the cursors post-adjusted since the subtree has an empty leaf anyway. * Rename HAMMER_CURSOR_DELBTREE to HAMMER_CURSOR_RETEST so its function is more apparent. * Properly set the HAMMER_CURSOR_RETEST flag when relocking a cursor that has tracked a ripout, so the cursor's new current element is re-tested by any iteration using the cursor. * Remove code that allowed a SETUP record to be converted to a FLUSH record if the target inode is already in the correct flush group. The problem is that target inode has already setup its sync state for the backend and the nlinks count will not be correct if we add another directory ADD/DEL record to the flush. While strictly a temporary nlinks mismatch (the next flush would correct it), a crash occuring here would result in inconsistent nlink counts on the media. * Reference and release buffers instead of directly calling low level hammer_io_deallocate(), and generally reference and release buffers around reclamations in the buffer/io invalidation code to avoid races. In particular, the buffer must be referenced during a call to hammer_io_clear_modify(). * Fix a buffer leak in hammer_del_buffers() which is not only bad unto itself, but can also cause reblocking assertions on the presence of buffer aliases later on. * Return ENOTDIR if rmdir attempts to remove a non-directory. Reported-by: Francois Tigeot <ftigeot@wolfpond.org> (rmdir) Reported-by: YONETANI Tomokazu <qhwt+dfly@les.ath.cx> (multiple) show more ...
# 4c038e17	10-Jul-2008	Matthew Dillon <dillon@dragonflybsd.org>	HAMMER 60J/Many: Mirroring Finish implementing the core mirroring algorithm. The last bit was to add support for no-history deletions on the master. The same support also covers masters which have HAMMER 60J/Many: Mirroring Finish implementing the core mirroring algorithm. The last bit was to add support for no-history deletions on the master. The same support also covers masters which have pruned records away prior to the mirroring operation. As with the work done previously, the algorithm is 100% queue-less and has no age limitations. You could wait a month, and then do a mirroring update from master to slave, and the algorithm will efficiently handle it. The basic issue that this commit tackles is what to do when records are physically deleted from the master. When this occurs the mirror master cannot provide a list of records to delete to its slaves. The solution is to use the mirror TID propagation to physically identify swaths of the B-Tree in which a deletion MAY have taken place. The mirroring code uses this information to generate PASS and SKIP mrecords. A PASS identifies a record (sans its data payload) that remains within the identified swath and should already exist on the target. The mirroring target does a simultanious iteration of the same swath on the target B-Tree and deletes records not identified by the master. A SKIP is the heart of the algorithm's efficiency. The same mirror TID stored in the B-Tree can also identify large swaths of the B-Tree for which NO deletions have taken place (which will be most of the B-Tree). One SKIP Record can identify an arbitrarily large swath. The target uses the SKIP record to skip that swath on the target. No scan takes place. SKIP records can be generated from any internal node of the B-Tree and cover that node's entire sub-tree. This also provides us with the feature where the retention policy can be completely different between a master and a mirror, or between mirrors. When the slave identifies a record that must be deleted through the above algorithm it only needs to mark it as historically deleted, it does not have to physically delete the record. show more ...
# adf01747	07-Jul-2008	Matthew Dillon <dillon@dragonflybsd.org>	HAMMER 60E/Many: Mirroring, bug fixes * Work on the mirror_tid propagation code. The code now retries on EDEADLK so propagation is guaranteed to reach the root. * Get most of the mirror_write co HAMMER 60E/Many: Mirroring, bug fixes * Work on the mirror_tid propagation code. The code now retries on EDEADLK so propagation is guaranteed to reach the root. * Get most of the mirror_write code working. * Add PFS support for NFS exports. Change fid_reserved to fid_ext and use it to store the localization parameter that selects the PFS. This isn't well tested yet. * BUGFIX: Fix a bug in vol0_last_tid updates. Flush sequences might not always update the field, creating issues with mirroring and snapshots. * BUGFIX: Properly update the volume header CRC. * CLEANUP: Fix some obj_id's that were u_int64_t's. They should be int64_t's. * CLEANUP: #if 0 out unused code, remove other bits of unused code. show more ...
# b3bad96f	05-Jul-2008	Matthew Dillon <dillon@dragonflybsd.org>	HAMMER 60D/Many: Mirroring, bug fixes * Add more mirroring infrastructure. * Fix support for unix domain sockets. Reported-by: Gergo Szakal <bastyaelvtars@gmail.com>, Rumko <rumcic@gmail.com>
# c82af904	26-Jun-2008	Matthew Dillon <dillon@dragonflybsd.org>	HAMMER 59A/Many: Mirroring related work (and one bug fix). * BUG FIX: Fix a bug in directory hashkey generation. The iterator could sometimes conflict with a key already on-disk and interfere wit HAMMER 59A/Many: Mirroring related work (and one bug fix). * BUG FIX: Fix a bug in directory hashkey generation. The iterator could sometimes conflict with a key already on-disk and interfere with a pending deletion. The chance of this occuring was miniscule but not 0. Now fixed. The fix also revamps the directory iterator code, moving it all to one place and removing it from two other places. * PRUNING CHANGE: The pruning code no longer shifts the create_tid and delete_tid of adjacent records to fill gaps. This means that historical queries must either use snapshot softlinks or use a fine-grained transaction id greater then the most recent snapshot softlink. fine-grained historical access still works up to the first snapshot softlink. * Clean up the cursor code responsible for acquiring the parent node. * Add the core mirror ioctl read/write infrastructure. This work is still in progress. - ioctl commands - pseudofs enhancements, including st_dev munging. - mount options - transaction id and object id conflictless allocation - initial mirror_tid recursion up the B-Tree (not finished) - B-Tree mirror scan optimizations to skip sub-hierarchies that do not need to be scanned (requires mirror_tid recursion to be 100% working). show more ...
# bf3b416b	14-Jun-2008	Matthew Dillon <dillon@dragonflybsd.org>	HAMMER 55: Performance tuning and bug fixes - MEDIA STRUCTURES CHANGED! * BUG-FIX: Fix a race in hammer_rel_mem_record() which could result in a machine lockup. The code could block at an inappro HAMMER 55: Performance tuning and bug fixes - MEDIA STRUCTURES CHANGED! * BUG-FIX: Fix a race in hammer_rel_mem_record() which could result in a machine lockup. The code could block at an inappropriate time with both the record and a dependancy inode pointer left unprotected. * BUG-FIX: The direct-write code could assert on (error != 0) due to an incorrect conditional in the in-memory record scanning code. Inode data and directory entry data has been given its own zone as a stop-gap until the low level allocator can be rewritten. * Increase the directory object-id cache from 128 entries to 1024 entries. * General cleanup. * Introduce a separate reblocking domain for directories: 'hammer reblock-dirs'. show more ...
# 47637bff	07-Jun-2008	Matthew Dillon <dillon@dragonflybsd.org>	HAMMER 53A/Many: Read and write performance enhancements, etc. * Add hammer_io_direct_read(). For full-block reads this code allows a high-level frontend buffer cache buffer associated with the HAMMER 53A/Many: Read and write performance enhancements, etc. * Add hammer_io_direct_read(). For full-block reads this code allows a high-level frontend buffer cache buffer associated with the regular file vnode to directly access the underlying storage, instead of loading that storage via a hammer_buffer and bcopy()ing it. * Add a write bypass, allowing the frontend to bypass the flusher and write full-blocks directly to the underlying storage, greatly improving frontend write performance. Caveat: See note at bottom. The write bypass is implemented by adding a feature whereby the frontend can soft-reserve unused disk space on the physical media without having to interact (much) with on-disk meta-data structures. This allows the frontend to flush high-level buffer cache buffers directly to disk and release the buffer for reuse by the system, resulting in very high write performance. To properly associate the reserved space with the filesystem so it can be accessed in later reads, an in-memory hammer_record is created referencing it. This record is queued to the backend flusher for final disposition. The backend disposes of the record by inserting the appropriate B-Tree element and marking the storage as allocated. At that point the storage becomes official. * Clean up numerous procedures to support the above new features. In particular, do a major cleanup of the cached truncation offset code (this is the code which allows HAMMER to implement wholely asynchronous truncate()/ftruncate() support. Also clean up the flusher triggering code, removing numerous hacks that had been in place to deal with the lack of a direct-write mechanism. * Start working on statistics gathering to track record and B-Tree operations. * CAVEAT: The backend flusher creates a significant cpu burden when flushing a large number of in-memory data records. Even though the data itself has already been written to disk, there is currently a great deal of overhead involved in manipulating the B-Tree to insert the new records. Overall write performance will only be modestly improved until these code paths are optimized. show more ...
# 2f85fa4d	18-May-2008	Matthew Dillon <dillon@dragonflybsd.org>	HAMMER 46/Many: Performance pass, media changes, bug fixes. * Add a localization field to the B-Tree element which has sorting priority over the object id. Use the localization field to separat HAMMER 46/Many: Performance pass, media changes, bug fixes. * Add a localization field to the B-Tree element which has sorting priority over the object id. Use the localization field to separate inode entries from file data. This allows the reblocker to cluster inode information together and greatly improves directory/stat performance. * Enhance the reblocker to reblock internal B-Tree nodes as well as leaves. * Enhance the reblocker by adding 'reblock-inodes' in addition to 'reblock-data' and 'reblock-btree', allowing individual types of meta-data to be independantly reblocked. * Fix a bug in hammer_bread(). The buffer's zoneX_offset field was sometimes not being properly masked, resulting in unnecessary blockmap lookups. Also add hammer_clrxlate_buffer() to clear the translation cache for a hammer_buffer. * Fix numerous issues with hmp->sync_lock. * Fix a buffer exhaustion issue in the pruner and reblocker due to not counting I/O's in progress as being dirty. * Enhance the symlink implementation. Take advantage of the extra 24 bytes of space in the inode data to directly store symlinks <= 24 bytes. * Use cluster_read() to gang read I/O's into 64KB chunks. Rely on localization and the reblocker and pruner to make doing the larger I/O's worthwhile. These changes reduce ls -lR overhead on 43383 files (half created with cpdup, half totally randomly created with blogbench). Overhead went from 35 seconds after reblocking, before the changes, to 5 seconds after reblocking, after the changes. show more ...
# 11ad5ade	12-May-2008	Matthew Dillon <dillon@dragonflybsd.org>	HAMMER 43/Many: Remove records from the media format, plus other stuff * Get rid of hammer_record_ondisk. As HAMMER has evolved the need for a separate record structure has devolved into triviali HAMMER 43/Many: Remove records from the media format, plus other stuff * Get rid of hammer_record_ondisk. As HAMMER has evolved the need for a separate record structure has devolved into trivialities. Originally the idea was to have B-Tree nodes referencing records and data. The B-Tree elements were originally intended to be throw-away and the on-media records were originally intended to be the official representation of the data and contained additional meta-information such as the obj_id of a directory entry and a few additional fields related to the inode. But once the UNDO code went in and it became obvious that the B-Tree needed to be tracked (undo-wise) along with everything else, the need for an official representation of the record as a separate media structure essentially disappeared. Move the directory-record meta-data into the directory-entry data and move the inode-record meta-data into the inode-record data. As a single exception move the atime field to the B-Tree element itself (it replaces what used to be the record offset), in order to continue to allow atime updates to occur without requiring record rewrites. With these changes records are no longer needed at all, so remove the on-media record structure and all the related code. * The removal of the on-media record structure also greatly improves performance. * B-Tree elements are now the official on-media record. * Fix a race in the extraction of the root of the B-Tree. * Clean up the in-memory record handling API. Instead of having to construct B-Tree leaf elements we can simply embed one in the in-memory record structure (struct hammer_record), and in the inode. show more ...
# 4e17f465	03-May-2008	Matthew Dillon <dillon@dragonflybsd.org>	HAMMER 40D/Many: Inode/link-count sequencer cleanup pass. * Move the vfsync from the frontend to the backend. This allows the frontend to passively move inodes to the backend without having to HAMMER 40D/Many: Inode/link-count sequencer cleanup pass. * Move the vfsync from the frontend to the backend. This allows the frontend to passively move inodes to the backend without having to actually start the flush, greatly improving performance. * Use an inode lock to deal with directory entry syncing races between the frontend and the backend. It isn't optimal but it's ok for now. * Massively optimize the backend code by initializing a single cursor for an inode and passing the cursor to procedures, instead of having each procedure initialize its own cursor. * Fix a sequencing issue with the backend. While building the flush state for an inode another process could get in and initiate its own flush, screwing up the flush group and creating confusion. (hmp->flusher_lock) * Don't lose track of HAMMER_FLUSH_SIGNAL flush requests. If we get such a requet but have to flag a reflush, also flag that the reflush is to be signaled (done immediately when the current flush is done). * Remove shared inode locks from hammer_vnops.c. Their original purpose no longer exists. * Simplify the arguments passed to numerous procedures (hammer_ip_first(), etc). show more ...
# 1f07f686	02-May-2008	Matthew Dillon <dillon@dragonflybsd.org>	HAMMER 40A/Many: Inode/link-count sequencer. * Remove the hammer_depend structure and build the dependancies directly into the hammer_record structure. * Attempt to implement layout rules to ensu HAMMER 40A/Many: Inode/link-count sequencer. * Remove the hammer_depend structure and build the dependancies directly into the hammer_record structure. * Attempt to implement layout rules to ensure connectivity is maintained. This means, for example, that before HAMMER can flush a newly created file it will make sure the file has namespace connectivity to the directory it was created it, recursively to the root. NOTE: 40A destabilizes the filesystem a bit, it's going to take a few passes to get everything working properly. There are numerous issues with this commit. show more ...
# b84de5af	24-Apr-2008	Matthew Dillon <dillon@dragonflybsd.org>	HAMMER 38A/Many: Undo/Synchronization and crash recovery * Separate all frontend operations from all backend media synchronization. The frontend VNOPs make all changes in-memory and in the fronten HAMMER 38A/Many: Undo/Synchronization and crash recovery * Separate all frontend operations from all backend media synchronization. The frontend VNOPs make all changes in-memory and in the frontend buffer cache. The backend buffer cache used to manage meta-data is not touched. - In-memory inode contains two copies of critical meta-data structures - In-memory record tree distinguishes between records undergoing synchronization and records not undergoing synchronization. - Frontend buffer cache buffers are tracked to determine which ones to synchronize and which ones not to. - Deletions are cached in-memory. Any number of file truncations simply caches the lowest truncation offset and on-media records beyond that point are ignored. Record deletions are cached as a negative entry in the in-memory record tree until the backend can execute the operation on the media. - Frontend operations continue to have full, direct read access to the media. * Backend synchronization to the disk media is able to take place simultaniously with frontend operations on the same inodes. This will need some tuning but it basically works. * In-memory records are no longer removed from the B-Tree when deleted. They are marked for deletion and removed when the last reference goes away. * An Inode whos last reference is being released is handed over to the backend flusher for its final disposition. * There are some bad hacks and debugging tests in this commit. In particular when the backend needs to do a truncation it special-cases any negative entries it finds in the in-memory record tree. Also, if a rename operation hits a deadlock it currently breaks atomicy. * The transaction API has been simplified. The frontend no longer allocates transaction ids. Instead the backend does a full flush with a single transaction id (since that is the granularity the crash recovery code will have anyway). show more ...
12