1 2* Need backend synchronization / serialization when the frontend detaches 3 a XOP. modify_tid tests won't be enough, the backend may wind up executing 4 the XOP out of order after the detach. 5 6* xop_start - only start synchronized elements 7 8* See if we can remove hammer2_inode_repoint() 9 10* FIXME - logical buffer associated with write-in-progress on backend 11 disappears once the cluster validates, even if more backend nodes 12 are in progress. 13 14* FIXME - backend ops need per-node transactions using spmp to protect 15 against flush. 16 17* FIXME - modifying backend ops are not currently validating the cluster. 18 That probably needs to be done by the frontend in hammer2_xop_start() 19 20* modify_tid handling probably broken w/ the XOP code for the moment. 21 22* embedded transactions in XOPs - interlock early completion 23 24* remove current incarnation of EAGAIN 25 26* mtx locks should not track td_locks count?. They can be acquired by one 27 thread and released by another. Need API function for exclusive locks. 28 29* Convert xops and hammer2_update_spans() from cluster back into chain calls 30 31* syncthr leaves inode locks for entire sync, which is wrong. 32 33* recovery scan vs unmount. At the moment an unmount does its flushes, 34 and if successful the freemap will be fully up-to-date, but the mount 35 code doesn't know that and the last flush batch will probably match 36 the PFS root mirror_tid. If it was a large cpdup the (unnecessary) 37 recovery pass at mount time can be extensive. Add a CLEAN flag to the 38 volume header to optimize out the unnecessary recovery pass. 39 40* More complex transaction sequencing and flush merging. Right now it is 41 all serialized against flushes. 42 43* adding new pfs - freeze and force remaster 44 45* removing a pfs - freeze and force remaster 46 47* bulkfree - sync between passes and enforce serialization of operation 48 49* bulkfree - signal check, allow interrupt 50 51* bulkfree - sub-passes when kernel memory block isn't large enough 52 53* bulkfree - limit kernel memory allocation for bmap space 54 55* bulkfree - must include any detached vnodes in scan so open unlinked files 56 are not ripped out from under the system. 57 58* bulkfree - must include all volume headers in scan so they can be used 59 for recovery or automatic snapshot retrieval. 60 61* bulkfree - snapshot duplicate sub-tree cache and tests needed to reduce 62 unnecessary re-scans. 63 64* Currently the check code (bref.methods / crc, sha, etc) is being checked 65 every single blasted time a chain is locked, even if the underlying buffer 66 was previously checked for that chain. This needs an optimization to 67 (significantly) improve performance. 68 69* flush synchronization boundary crossing check and current flush chain 70 interlock needed. 71 72* snapshot creation must allocate and separately pass a new pmp for the pfs 73 degenerate 'cluster' representing the snapshot. This theoretically will 74 also allow a snapshot to be generated inside a cluster of more than one 75 node. 76 77* snapshot copy currently also copies uuids and can confuse cluster code 78 79* hidden dir or other dirs/files/modifications made to PFS before 80 additional cluster entries added. 81 82* transaction on cluster - multiple trans structures, subtrans 83 84* inode always contains target cluster/chain, not hardlink 85 86* chain refs in cluster, cluster refs 87 88* check inode shared lock ... can end up in endless loop if following 89 hardlink because ip->chain is not updated in the exclusive lock cycle 90 when following hardlink. 91 92cpdup /build/boomdata/jails/bleeding-edge/usr/share/man/man4 /mnt/x3 93 94 95 * The block freeing code. At the very least a bulk scan is needed 96 to implement freeing blocks. 97 98 * Crash stability. Right now the allocation table on-media is not 99 properly synchronized with the flush. This needs to be adjusted 100 such that H2 can do an incremental scan on mount to fixup 101 allocations on mount as part of its crash recovery mechanism. 102 103 * We actually have to start checking and acting upon the CRCs being 104 generated. 105 106 * Remaining known hardlink issues need to be addressed. 107 108 * Core 'copies' mechanism needs to be implemented to support multiple 109 copies on the same media. 110 111 * Core clustering mechanism needs to be implemented to support 112 mirroring and basic multi-master operation from a single host 113 (multi-host requires additional network protocols and won't 114 be as easy). 115 116* make sure we aren't using a shared lock during RB_SCAN's? 117 118* overwrite in write_file case w/compression - if device block size changes 119 the block has to be deleted and reallocated. See hammer2_assign_physical() 120 in vnops. 121 122* freemap / clustering. Set block size on 2MB boundary so the cluster code 123 can be used for reading. 124 125* need API layer for shared buffers (unfortunately). 126 127* add magic number to inode header, add parent inode number too, to 128 help with brute-force recovery. 129 130* modifications past our flush point do not adjust vchain. 131 need to make vchain dynamic so we can (see flush_scan2).?? 132 133* MINIOSIZE/RADIX set to 1KB for now to avoid buffer cache deadlocks 134 on multiple locked inodes. Fix so we can use LBUFSIZE! Or, 135 alternatively, allow a smaller I/O size based on the sector size 136 (not optimal though). 137 138* When making a snapshot, do not allow the snapshot to be mounted until 139 the in-memory chain has been freed in order to break the shared core. 140 141* Snapshotting a sub-directory does not snapshot any 142 parent-directory-spanning hardlinks. 143 144* Snapshot / flush-synchronization point. remodified data that crosses 145 the synchronization boundary is not currently reallocated. see 146 hammer2_chain_modify(), explicit check (requires logical buffer cache 147 buffer handling). 148 149* on fresh mount with multiple hardlinks present separate lookups will 150 result in separate vnodes pointing to separate inodes pointing to a 151 common chain (the hardlink target). 152 153 When the hardlink target consolidates upward only one vp/ip will be 154 adjusted. We need code to fixup the other chains (probably put in 155 inode_lock_*()) which will be pointing to an older deleted hardlink 156 target. 157 158* Filesystem must ensure that modify_tid is not too large relative to 159 the iterator in the volume header, on load, or flush sequencing will 160 not work properly. We should be able to just override it, but we 161 should complain if it happens. 162 163* Kernel-side needs to clean up transaction queues and make appropriate 164 callbacks. 165 166* Userland side needs to do the same for any initiated transactions. 167 168* Nesting problems in the flusher. 169 170* Inefficient vfsync due to thousands of file buffers, one per-vnode. 171 (need to aggregate using a device buffer?) 172 173* Use bp->b_dep to interlock the buffer with the chain structure so the 174 strategy code can calculate the crc and assert that the chain is marked 175 modified (not yet flushed). 176 177* Deleted inode not reachable via tree for volume flush but still reachable 178 via fsync/inactive/reclaim. Its tree can be destroyed at that point. 179 180* The direct write code needs to invalidate any underlying physical buffers. 181 Direct write needs to be implemented. 182 183* Make sure a resized block (hammer2_chain_resize()) calculates a new 184 hash code in the parent bref 185 186* The freemap allocator needs to getblk/clrbuf/bdwrite any partial 187 block allocations (less than 64KB) that allocate out of a new 64K 188 block, to avoid causing a read-before-write I/O. 189 190* Check flush race upward recursion setting SUBMODIFIED vs downward 191 recursion checking SUBMODIFIED then locking (must clear before the 192 recursion and might need additional synchronization) 193 194* There is definitely a flush race in the hardlink implementation between 195 the forwarding entries and the actual (hidden) hardlink inode. 196 197 This will require us to associate a small hard-link-adjust structure 198 with the chain whenever we create or delete hardlinks, on top of 199 adjusting the hardlink inode itself. Any actual flush to the media 200 has to synchronize the correct nlinks value based on whether related 201 created or deleted hardlinks were also flushed. 202 203* When a directory entry is created and also if an indirect block is 204 created and entries moved into it, the directory seek position can 205 potentially become incorrect during a scan. 206 207* When a directory entry is deleted a directory seek position depending 208 on that key can cause readdir to skip entries. 209 210* TWO PHASE COMMIT - store two data offsets in the chain, and 211 hammer2_chain_delete() needs to leave the chain intact if MODIFIED2 is 212 set on its buffer until the flusher gets to it? 213 214 215 OPTIMIZATIONS 216 217* If a file is unlinked buts its descriptors is left open and used, we 218 should allow data blocks on-media to be reused since there is no 219 topology left to point at them. 220