1 2* syncthr leaves inode locks for entire sync, which is wrong. 3 4* recovery scan vs unmount. At the moment an unmount does its flushes, 5 and if successful the freemap will be fully up-to-date, but the mount 6 code doesn't know that and the last flush batch will probably match 7 the PFS root mirror_tid. If it was a large cpdup the (unnecessary) 8 recovery pass at mount time can be extensive. Add a CLEAN flag to the 9 volume header to optimize out the unnecessary recovery pass. 10 11* More complex transaction sequencing and flush merging. Right now it is 12 all serialized against flushes. 13 14* adding new pfs - freeze and force remaster 15 16* removing a pfs - freeze and force remaster 17 18* bulkfree - sync between passes and enforce serialization of operation 19 20* bulkfree - signal check, allow interrupt 21 22* bulkfree - sub-passes when kernel memory block isn't large enough 23 24* bulkfree - limit kernel memory allocation for bmap space 25 26* bulkfree - must include any detached vnodes in scan so open unlinked files 27 are not ripped out from under the system. 28 29* bulkfree - must include all volume headers in scan so they can be used 30 for recovery or automatic snapshot retrieval. 31 32* bulkfree - snapshot duplicate sub-tree cache and tests needed to reduce 33 unnecessary re-scans. 34 35* Currently the check code (bref.methods / crc, sha, etc) is being checked 36 every single blasted time a chain is locked, even if the underlying buffer 37 was previously checked for that chain. This needs an optimization to 38 (significantly) improve performance. 39 40* flush synchronization boundary crossing check and current flush chain 41 interlock needed. 42 43* snapshot creation must allocate and separately pass a new pmp for the pfs 44 degenerate 'cluster' representing the snapshot. This theoretically will 45 also allow a snapshot to be generated inside a cluster of more than one 46 node. 47 48* snapshot copy currently also copies uuids and can confuse cluster code 49 50* hidden dir or other dirs/files/modifications made to PFS before 51 additional cluster entries added. 52 53* transaction on cluster - multiple trans structures, subtrans 54 55* inode always contains target cluster/chain, not hardlink 56 57* chain refs in cluster, cluster refs 58 59* check inode shared lock ... can end up in endless loop if following 60 hardlink because ip->chain is not updated in the exclusive lock cycle 61 when following hardlink. 62 63cpdup /build/boomdata/jails/bleeding-edge/usr/share/man/man4 /mnt/x3 64 65 66 * The block freeing code. At the very least a bulk scan is needed 67 to implement freeing blocks. 68 69 * Crash stability. Right now the allocation table on-media is not 70 properly synchronized with the flush. This needs to be adjusted 71 such that H2 can do an incremental scan on mount to fixup 72 allocations on mount as part of its crash recovery mechanism. 73 74 * We actually have to start checking and acting upon the CRCs being 75 generated. 76 77 * Remaining known hardlink issues need to be addressed. 78 79 * Core 'copies' mechanism needs to be implemented to support multiple 80 copies on the same media. 81 82 * Core clustering mechanism needs to be implemented to support 83 mirroring and basic multi-master operation from a single host 84 (multi-host requires additional network protocols and won't 85 be as easy). 86 87* make sure we aren't using a shared lock during RB_SCAN's? 88 89* overwrite in write_file case w/compression - if device block size changes 90 the block has to be deleted and reallocated. See hammer2_assign_physical() 91 in vnops. 92 93* freemap / clustering. Set block size on 2MB boundary so the cluster code 94 can be used for reading. 95 96* need API layer for shared buffers (unfortunately). 97 98* add magic number to inode header, add parent inode number too, to 99 help with brute-force recovery. 100 101* modifications past our flush point do not adjust vchain. 102 need to make vchain dynamic so we can (see flush_scan2).?? 103 104* MINIOSIZE/RADIX set to 1KB for now to avoid buffer cache deadlocks 105 on multiple locked inodes. Fix so we can use LBUFSIZE! Or, 106 alternatively, allow a smaller I/O size based on the sector size 107 (not optimal though). 108 109* When making a snapshot, do not allow the snapshot to be mounted until 110 the in-memory chain has been freed in order to break the shared core. 111 112* Snapshotting a sub-directory does not snapshot any 113 parent-directory-spanning hardlinks. 114 115* Snapshot / flush-synchronization point. remodified data that crosses 116 the synchronization boundary is not currently reallocated. see 117 hammer2_chain_modify(), explicit check (requires logical buffer cache 118 buffer handling). 119 120* on fresh mount with multiple hardlinks present separate lookups will 121 result in separate vnodes pointing to separate inodes pointing to a 122 common chain (the hardlink target). 123 124 When the hardlink target consolidates upward only one vp/ip will be 125 adjusted. We need code to fixup the other chains (probably put in 126 inode_lock_*()) which will be pointing to an older deleted hardlink 127 target. 128 129* Filesystem must ensure that modify_tid is not too large relative to 130 the iterator in the volume header, on load, or flush sequencing will 131 not work properly. We should be able to just override it, but we 132 should complain if it happens. 133 134* Kernel-side needs to clean up transaction queues and make appropriate 135 callbacks. 136 137* Userland side needs to do the same for any initiated transactions. 138 139* Nesting problems in the flusher. 140 141* Inefficient vfsync due to thousands of file buffers, one per-vnode. 142 (need to aggregate using a device buffer?) 143 144* Use bp->b_dep to interlock the buffer with the chain structure so the 145 strategy code can calculate the crc and assert that the chain is marked 146 modified (not yet flushed). 147 148* Deleted inode not reachable via tree for volume flush but still reachable 149 via fsync/inactive/reclaim. Its tree can be destroyed at that point. 150 151* The direct write code needs to invalidate any underlying physical buffers. 152 Direct write needs to be implemented. 153 154* Make sure a resized block (hammer2_chain_resize()) calculates a new 155 hash code in the parent bref 156 157* The freemap allocator needs to getblk/clrbuf/bdwrite any partial 158 block allocations (less than 64KB) that allocate out of a new 64K 159 block, to avoid causing a read-before-write I/O. 160 161* Check flush race upward recursion setting SUBMODIFIED vs downward 162 recursion checking SUBMODIFIED then locking (must clear before the 163 recursion and might need additional synchronization) 164 165* There is definitely a flush race in the hardlink implementation between 166 the forwarding entries and the actual (hidden) hardlink inode. 167 168 This will require us to associate a small hard-link-adjust structure 169 with the chain whenever we create or delete hardlinks, on top of 170 adjusting the hardlink inode itself. Any actual flush to the media 171 has to synchronize the correct nlinks value based on whether related 172 created or deleted hardlinks were also flushed. 173 174* When a directory entry is created and also if an indirect block is 175 created and entries moved into it, the directory seek position can 176 potentially become incorrect during a scan. 177 178* When a directory entry is deleted a directory seek position depending 179 on that key can cause readdir to skip entries. 180 181* TWO PHASE COMMIT - store two data offsets in the chain, and 182 hammer2_chain_delete() needs to leave the chain intact if MODIFIED2 is 183 set on its buffer until the flusher gets to it? 184 185 186 OPTIMIZATIONS 187 188* If a file is unlinked buts its descriptors is left open and used, we 189 should allow data blocks on-media to be reused since there is no 190 topology left to point at them. 191