1 2* bulkfree pass needs to do a vchain flush from the root to avoid 3 accidently freeing live in-process chains. 4 5* Need backend synchronization / serialization when the frontend detaches 6 a XOP. modify_tid tests won't be enough, the backend may wind up executing 7 the XOP out of order after the detach. 8 9* xop_start - only start synchronized elements 10 11* See if we can remove hammer2_inode_repoint() 12 13* FIXME - logical buffer associated with write-in-progress on backend 14 disappears once the cluster validates, even if more backend nodes 15 are in progress. 16 17* FIXME - backend ops need per-node transactions using spmp to protect 18 against flush. 19 20* FIXME - modifying backend ops are not currently validating the cluster. 21 That probably needs to be done by the frontend in hammer2_xop_start() 22 23* modify_tid handling probably broken w/ the XOP code for the moment. 24 25* embedded transactions in XOPs - interlock early completion 26 27* remove current incarnation of EAGAIN 28 29* mtx locks should not track td_locks count?. They can be acquired by one 30 thread and released by another. Need API function for exclusive locks. 31 32* Convert xops and hammer2_update_spans() from cluster back into chain calls 33 34* syncthr leaves inode locks for entire sync, which is wrong. 35 36* recovery scan vs unmount. At the moment an unmount does its flushes, 37 and if successful the freemap will be fully up-to-date, but the mount 38 code doesn't know that and the last flush batch will probably match 39 the PFS root mirror_tid. If it was a large cpdup the (unnecessary) 40 recovery pass at mount time can be extensive. Add a CLEAN flag to the 41 volume header to optimize out the unnecessary recovery pass. 42 43* More complex transaction sequencing and flush merging. Right now it is 44 all serialized against flushes. 45 46* adding new pfs - freeze and force remaster 47 48* removing a pfs - freeze and force remaster 49 50* bulkfree - sync between passes and enforce serialization of operation 51 52* bulkfree - signal check, allow interrupt 53 54* bulkfree - sub-passes when kernel memory block isn't large enough 55 56* bulkfree - limit kernel memory allocation for bmap space 57 58* bulkfree - must include any detached vnodes in scan so open unlinked files 59 are not ripped out from under the system. 60 61* bulkfree - must include all volume headers in scan so they can be used 62 for recovery or automatic snapshot retrieval. 63 64* bulkfree - snapshot duplicate sub-tree cache and tests needed to reduce 65 unnecessary re-scans. 66 67* Currently the check code (bref.methods / crc, sha, etc) is being checked 68 every single blasted time a chain is locked, even if the underlying buffer 69 was previously checked for that chain. This needs an optimization to 70 (significantly) improve performance. 71 72* flush synchronization boundary crossing check and current flush chain 73 interlock needed. 74 75* snapshot creation must allocate and separately pass a new pmp for the pfs 76 degenerate 'cluster' representing the snapshot. This theoretically will 77 also allow a snapshot to be generated inside a cluster of more than one 78 node. 79 80* snapshot copy currently also copies uuids and can confuse cluster code 81 82* hidden dir or other dirs/files/modifications made to PFS before 83 additional cluster entries added. 84 85* transaction on cluster - multiple trans structures, subtrans 86 87* inode always contains target cluster/chain, not hardlink 88 89* chain refs in cluster, cluster refs 90 91* check inode shared lock ... can end up in endless loop if following 92 hardlink because ip->chain is not updated in the exclusive lock cycle 93 when following hardlink. 94 95cpdup /build/boomdata/jails/bleeding-edge/usr/share/man/man4 /mnt/x3 96 97 98 * The block freeing code. At the very least a bulk scan is needed 99 to implement freeing blocks. 100 101 * Crash stability. Right now the allocation table on-media is not 102 properly synchronized with the flush. This needs to be adjusted 103 such that H2 can do an incremental scan on mount to fixup 104 allocations on mount as part of its crash recovery mechanism. 105 106 * We actually have to start checking and acting upon the CRCs being 107 generated. 108 109 * Remaining known hardlink issues need to be addressed. 110 111 * Core 'copies' mechanism needs to be implemented to support multiple 112 copies on the same media. 113 114 * Core clustering mechanism needs to be implemented to support 115 mirroring and basic multi-master operation from a single host 116 (multi-host requires additional network protocols and won't 117 be as easy). 118 119* make sure we aren't using a shared lock during RB_SCAN's? 120 121* overwrite in write_file case w/compression - if device block size changes 122 the block has to be deleted and reallocated. See hammer2_assign_physical() 123 in vnops. 124 125* freemap / clustering. Set block size on 2MB boundary so the cluster code 126 can be used for reading. 127 128* need API layer for shared buffers (unfortunately). 129 130* add magic number to inode header, add parent inode number too, to 131 help with brute-force recovery. 132 133* modifications past our flush point do not adjust vchain. 134 need to make vchain dynamic so we can (see flush_scan2).?? 135 136* MINIOSIZE/RADIX set to 1KB for now to avoid buffer cache deadlocks 137 on multiple locked inodes. Fix so we can use LBUFSIZE! Or, 138 alternatively, allow a smaller I/O size based on the sector size 139 (not optimal though). 140 141* When making a snapshot, do not allow the snapshot to be mounted until 142 the in-memory chain has been freed in order to break the shared core. 143 144* Snapshotting a sub-directory does not snapshot any 145 parent-directory-spanning hardlinks. 146 147* Snapshot / flush-synchronization point. remodified data that crosses 148 the synchronization boundary is not currently reallocated. see 149 hammer2_chain_modify(), explicit check (requires logical buffer cache 150 buffer handling). 151 152* on fresh mount with multiple hardlinks present separate lookups will 153 result in separate vnodes pointing to separate inodes pointing to a 154 common chain (the hardlink target). 155 156 When the hardlink target consolidates upward only one vp/ip will be 157 adjusted. We need code to fixup the other chains (probably put in 158 inode_lock_*()) which will be pointing to an older deleted hardlink 159 target. 160 161* Filesystem must ensure that modify_tid is not too large relative to 162 the iterator in the volume header, on load, or flush sequencing will 163 not work properly. We should be able to just override it, but we 164 should complain if it happens. 165 166* Kernel-side needs to clean up transaction queues and make appropriate 167 callbacks. 168 169* Userland side needs to do the same for any initiated transactions. 170 171* Nesting problems in the flusher. 172 173* Inefficient vfsync due to thousands of file buffers, one per-vnode. 174 (need to aggregate using a device buffer?) 175 176* Use bp->b_dep to interlock the buffer with the chain structure so the 177 strategy code can calculate the crc and assert that the chain is marked 178 modified (not yet flushed). 179 180* Deleted inode not reachable via tree for volume flush but still reachable 181 via fsync/inactive/reclaim. Its tree can be destroyed at that point. 182 183* The direct write code needs to invalidate any underlying physical buffers. 184 Direct write needs to be implemented. 185 186* Make sure a resized block (hammer2_chain_resize()) calculates a new 187 hash code in the parent bref 188 189* The freemap allocator needs to getblk/clrbuf/bdwrite any partial 190 block allocations (less than 64KB) that allocate out of a new 64K 191 block, to avoid causing a read-before-write I/O. 192 193* Check flush race upward recursion setting SUBMODIFIED vs downward 194 recursion checking SUBMODIFIED then locking (must clear before the 195 recursion and might need additional synchronization) 196 197* There is definitely a flush race in the hardlink implementation between 198 the forwarding entries and the actual (hidden) hardlink inode. 199 200 This will require us to associate a small hard-link-adjust structure 201 with the chain whenever we create or delete hardlinks, on top of 202 adjusting the hardlink inode itself. Any actual flush to the media 203 has to synchronize the correct nlinks value based on whether related 204 created or deleted hardlinks were also flushed. 205 206* When a directory entry is created and also if an indirect block is 207 created and entries moved into it, the directory seek position can 208 potentially become incorrect during a scan. 209 210* When a directory entry is deleted a directory seek position depending 211 on that key can cause readdir to skip entries. 212 213* TWO PHASE COMMIT - store two data offsets in the chain, and 214 hammer2_chain_delete() needs to leave the chain intact if MODIFIED2 is 215 set on its buffer until the flusher gets to it? 216 217 218 OPTIMIZATIONS 219 220* If a file is unlinked buts its descriptors is left open and used, we 221 should allow data blocks on-media to be reused since there is no 222 topology left to point at them. 223