xref: /dragonfly/sys/vfs/hammer2/TODO (revision c6f73aab)
1* Need backend synchronization / serialization when the frontend detaches
2  a XOP.  modify_tid tests won't be enough, the backend may wind up executing
3  the XOP out of order after the detach.
4
5* xop_start - only start synchronized elements
6
7* See if we can remove hammer2_inode_repoint()
8
9* FIXME - logical buffer associated with write-in-progress on backend
10  disappears once the cluster validates, even if more backend nodes
11  are in progress.
12
13* FIXME - backend ops need per-node transactions using spmp to protect
14  against flush.
15
16* FIXME - modifying backend ops are not currently validating the cluster.
17  That probably needs to be done by the frontend in hammer2_xop_start()
18
19* modify_tid handling probably broken w/ the XOP code for the moment.
20
21* embedded transactions in XOPs - interlock early completion
22
23* remove current incarnation of EAGAIN
24
25* mtx locks should not track td_locks count?.  They can be acquired by one
26  thread and released by another.  Need API function for exclusive locks.
27
28* Convert xops and hammer2_update_spans() from cluster back into chain calls
29
30* syncthr leaves inode locks for entire sync, which is wrong.
31
32* recovery scan vs unmount.  At the moment an unmount does its flushes,
33  and if successful the freemap will be fully up-to-date, but the mount
34  code doesn't know that and the last flush batch will probably match
35  the PFS root mirror_tid.  If it was a large cpdup the (unnecessary)
36  recovery pass at mount time can be extensive.  Add a CLEAN flag to the
37  volume header to optimize out the unnecessary recovery pass.
38
39* More complex transaction sequencing and flush merging.  Right now it is
40  all serialized against flushes.
41
42* adding new pfs - freeze and force remaster
43
44* removing a pfs - freeze and force remaster
45
46* bulkfree - sync between passes and enforce serialization of operation
47
48* bulkfree - signal check, allow interrupt
49
50* bulkfree - sub-passes when kernel memory block isn't large enough
51
52* bulkfree - limit kernel memory allocation for bmap space
53
54* bulkfree - must include any detached vnodes in scan so open unlinked files
55	     are not ripped out from under the system.
56
57* bulkfree - must include all volume headers in scan so they can be used
58	     for recovery or automatic snapshot retrieval.
59
60* bulkfree - snapshot duplicate sub-tree cache and tests needed to reduce
61	     unnecessary re-scans.
62
63* Currently the check code (bref.methods / crc, sha, etc) is being checked
64  every single blasted time a chain is locked, even if the underlying buffer
65  was previously checked for that chain.  This needs an optimization to
66  (significantly) improve performance.
67
68* flush synchronization boundary crossing check and current flush chain
69  interlock needed.
70
71* snapshot creation must allocate and separately pass a new pmp for the pfs
72  degenerate 'cluster' representing the snapshot.  This theoretically will
73  also allow a snapshot to be generated inside a cluster of more than one
74  node.
75
76* snapshot copy currently also copies uuids and can confuse cluster code
77
78* hidden dir or other dirs/files/modifications made to PFS before
79  additional cluster entries added.
80
81* transaction on cluster - multiple trans structures, subtrans
82
83* inode always contains target cluster/chain, not hardlink
84
85* chain refs in cluster, cluster refs
86
87* check inode shared lock ... can end up in endless loop if following
88  hardlink because ip->chain is not updated in the exclusive lock cycle
89  when following hardlink.
90
91cpdup /build/boomdata/jails/bleeding-edge/usr/share/man/man4 /mnt/x3
92
93
94        * The block freeing code.  At the very least a bulk scan is needed
95          to implement freeing blocks.
96
97        * Crash stability.  Right now the allocation table on-media is not
98          properly synchronized with the flush.  This needs to be adjusted
99          such that H2 can do an incremental scan on mount to fixup
100          allocations on mount as part of its crash recovery mechanism.
101
102        * We actually have to start checking and acting upon the CRCs being
103          generated.
104
105        * Remaining known hardlink issues need to be addressed.
106
107        * Core 'copies' mechanism needs to be implemented to support multiple
108          copies on the same media.
109
110        * Core clustering mechanism needs to be implemented to support
111          mirroring and basic multi-master operation from a single host
112          (multi-host requires additional network protocols and won't
113          be as easy).
114
115* make sure we aren't using a shared lock during RB_SCAN's?
116
117* overwrite in write_file case w/compression - if device block size changes
118  the block has to be deleted and reallocated.  See hammer2_assign_physical()
119  in vnops.
120
121* freemap / clustering.  Set block size on 2MB boundary so the cluster code
122  can be used for reading.
123
124* need API layer for shared buffers (unfortunately).
125
126* add magic number to inode header, add parent inode number too, to
127  help with brute-force recovery.
128
129* modifications past our flush point do not adjust vchain.
130  need to make vchain dynamic so we can (see flush_scan2).??
131
132* MINIOSIZE/RADIX set to 1KB for now to avoid buffer cache deadlocks
133  on multiple locked inodes.  Fix so we can use LBUFSIZE!  Or,
134  alternatively, allow a smaller I/O size based on the sector size
135  (not optimal though).
136
137* When making a snapshot, do not allow the snapshot to be mounted until
138  the in-memory chain has been freed in order to break the shared core.
139
140* Snapshotting a sub-directory does not snapshot any
141  parent-directory-spanning hardlinks.
142
143* Snapshot / flush-synchronization point.  remodified data that crosses
144  the synchronization boundary is not currently reallocated.  see
145  hammer2_chain_modify(), explicit check (requires logical buffer cache
146  buffer handling).
147
148* on fresh mount with multiple hardlinks present separate lookups will
149  result in separate vnodes pointing to separate inodes pointing to a
150  common chain (the hardlink target).
151
152  When the hardlink target consolidates upward only one vp/ip will be
153  adjusted.  We need code to fixup the other chains (probably put in
154  inode_lock_*()) which will be pointing to an older deleted hardlink
155  target.
156
157* Filesystem must ensure that modify_tid is not too large relative to
158  the iterator in the volume header, on load, or flush sequencing will
159  not work properly.  We should be able to just override it, but we
160  should complain if it happens.
161
162* Kernel-side needs to clean up transaction queues and make appropriate
163  callbacks.
164
165* Userland side needs to do the same for any initiated transactions.
166
167* Nesting problems in the flusher.
168
169* Inefficient vfsync due to thousands of file buffers, one per-vnode.
170  (need to aggregate using a device buffer?)
171
172* Use bp->b_dep to interlock the buffer with the chain structure so the
173  strategy code can calculate the crc and assert that the chain is marked
174  modified (not yet flushed).
175
176* Deleted inode not reachable via tree for volume flush but still reachable
177  via fsync/inactive/reclaim.  Its tree can be destroyed at that point.
178
179* The direct write code needs to invalidate any underlying physical buffers.
180  Direct write needs to be implemented.
181
182* Make sure a resized block (hammer2_chain_resize()) calculates a new
183  hash code in the parent bref
184
185* The freemap allocator needs to getblk/clrbuf/bdwrite any partial
186  block allocations (less than 64KB) that allocate out of a new 64K
187  block, to avoid causing a read-before-write I/O.
188
189* Check flush race upward recursion setting SUBMODIFIED vs downward
190  recursion checking SUBMODIFIED then locking (must clear before the
191  recursion and might need additional synchronization)
192
193* There is definitely a flush race in the hardlink implementation between
194  the forwarding entries and the actual (hidden) hardlink inode.
195
196  This will require us to associate a small hard-link-adjust structure
197  with the chain whenever we create or delete hardlinks, on top of
198  adjusting the hardlink inode itself.  Any actual flush to the media
199  has to synchronize the correct nlinks value based on whether related
200  created or deleted hardlinks were also flushed.
201
202* When a directory entry is created and also if an indirect block is
203  created and entries moved into it, the directory seek position can
204  potentially become incorrect during a scan.
205
206* When a directory entry is deleted a directory seek position depending
207  on that key can cause readdir to skip entries.
208
209* TWO PHASE COMMIT - store two data offsets in the chain, and
210  hammer2_chain_delete() needs to leave the chain intact if MODIFIED2 is
211  set on its buffer until the flusher gets to it?
212
213
214				OPTIMIZATIONS
215
216* If a file is unlinked buts its descriptors is left open and used, we
217  should allow data blocks on-media to be reused since there is no
218  topology left to point at them.
219