1# $NetBSD: CHANGES,v 1.5 2005/12/11 12:25:26 christos Exp $ 2 3kernel: 4 5- Instead of blindly continuing when it encounters an Inode that is 6 locked by another process, lfs_markv will process the rest of the 7 inodes passed to it and then return EAGAIN. The cleaner will 8 recognize this and not mark the segment clean. When the cleaner runs 9 again, the segment containg the (formerly) locked inode will sort high 10 for cleaning, since it is now almost entirely empty. 11 12- A beginning has been made to test keeping atime information in the 13 Ifile, instead of on the inodes. This should make read-mostly 14 filesystems significantly faster, since the inodes will then remain 15 close to the data blocks on disk; but of course the ifile will be 16 somewhat larger. This code is not enabled, as it makes the format of 17 IFILEs change. 18 19- The superblock has been broken into two components: an on-disk 20 superblock using fixed-size types, exactly 512 bytes regardless of 21 architecture (or could be enlarged in multiples of the media block 22 size up to LFS_SBPAD); and an in-memory superblock containing the 23 information only useful to a running LFS, including segment pointers, 24 etc. The superblock checksumming code has been modified to make 25 future changes to the superblock format easier. 26 27- Because of the way that lfs_writeseg works, buffers are freed before 28 they are really written to disk: their contents are copied into large 29 buffers which are written async. Because the buffer cache does not 30 serve to throttle these writes, and malloced memory is used to hold them, 31 there is a danger of running out of kmem_map. To avoid this, a new 32 compile-time parameter, LFS_THROTTLE, is used as an upper bound for the 33 number of partial-segments allowed to be in progress writing at any 34 given time. 35 36- If the system crashes between the point that a checkpoint is scheduled 37 for writing and the time that the write completes, the filesystem 38 could be left in an inconsistent state (no valid checkpoints on 39 disk). To avoid this, we toggle between the first two superblocks 40 when checkpointing, and (if it is indicated that no roll-forward agent 41 exists) do not allow one checkpoint to occur before the last one has 42 completed. When the filesystem is mounted, it uses the *older* of the 43 first two superblocks. 44 45- DIROPs: 46 47 The design of the LFS includes segregating vnodes used in directory 48 operations, so that they can be written at the same time during a 49 checkpoint, avoiding filesystem inconsistency after a crash. Code for 50 this was partially written for BSD4.4, but was not complete or enabled. 51 52 In particular, vnodes marked VDIROP could be flushed by getnewvnode at 53 any time, negating the usefulness of marking a vnode VDIROP, since if 54 the filesystem then crashed it would be inconsistent. Now, when a 55 vnode is first marked VDIROP it is also referenced. To avoid running 56 out of vnodes, an attempt to mark more than LFS_MAXDIROP vnodes wth 57 VDIROP will sleep, and trigger a partial-segment write when no dirops 58 are active. 59 60- LFS maintains a linked list of free inode numbers in the Ifile; 61 accesses to this list are now protected by a simple lock. 62 63- lfs_vfree is not allowed to run while an inode has blocks scheduled 64 for writing, since that could trigger a miscounting in lfs_truncate. 65 66- lfs_balloc now correctly extends fragments, if a block is written 67 beyond the current end-of-file. 68 69- Blocks which have already been gathered into a partial-segment are not 70 allowed to be extended, since if they were, any blocks following them 71 would either be written in the wrong place, or overwrite other blocks. 72 73- The LFS buffer-header accounting, which triggers a partial-segment 74 write if too many buffer-headers are in use by the LFS subystem, has 75 been expanded to include *bytes* used in LFS buffers as well. 76 77- Reads of the Ifile, which almost always come from the cleaner, can no 78 longer trigger a partial-segment write, since this could cause a 79 deadlock. 80 81- Support has been added (but not tested, and currently disabled by 82 default) for true read-only filesystems. Currently, if a filesystem 83 is mounted read-only the cleaner can still operate on it, but this 84 obviously would not be true for read-only media. (I think the 85 original plan was for the roll-forward agent to operate using this 86 "feature"?) 87 88- If a fake buffer is created by lfs_markv and another process draws the 89 same block in and changes it, the fake buffer is now discarded and 90 replaced by the "real" buffer containing the new data. 91 92- An inode which has blocks gathered no longer has IN_MODIFIED set, but 93 still does in fact have dirty blocks attached. lfs_update will now 94 wait for such an inode's writes to complete before it runs, 95 suppressing a panic in vinvalbuf. 96 97- Many filesystem operations now update the Ifile's mtime, allowing the 98 cleaner to detect when the filesystem is idle, and clean more 99 vigorously during such times (cf. Blackwell et al., 1995). 100 101- When writing a partial-segment, make sure that the current segment is 102 still marked ACTIVE afterward (otherwise the cleaner might try to 103 clean it, since it might well be mostly empty). 104 105- Don't trust the cleaner so much. Sort the blocks during gathering, 106 even if they came from the cleaner; verify the location of on-disk 107 inodes, even if the cleaner says it knows where they came from. 108 109- The cleaning code (lfs_markv in particular) has been entirely 110 rewritten, and the partial-segment writing code changed to match. 111 Lfs_markv no longer uses its own implementation of lfs_segwrite, but 112 marks inodes with IN_CLEANING to differentiate them from the 113 non-cleaning inodes. This change fixes numerous problems with the old 114 cleaner, including a buffer overrun, and lost extensions in active 115 fragments. lfs_bmapv looks up and returns the addresses of inode 116 blocks, so the cleaner can do something intelligent with them. 117 118 If IN_CLEANING is set on an inode during partial-segment write, only fake 119 buffers will be written, and IN_MODIFIED will not be cleared, saving 120 us from a panic in vinvalbuf. The addition of IN_CLEANING also allows 121 dirops to be active while cleaning is in progress; since otherwise 122 buffers engaged in active dirops might be written ahead of schedule, 123 and cause an inconsistent checkpoint to be written to disk. 124 125 (XXX - even now, DIROP blocks can sometimes be written to disk, if we 126 are cleaning the same blocks as are active? Grr, I don't see a good 127 solution for this!) 128 129- Added sysctl entries for LFS. In particular, `writeindir' controls 130 whether indirect blocks are written during non-checkpoint writes. 131 (Since there is no roll-forward agent as yet, there is no penalty in 132 not writing indirect blocks.) 133 134- Wake up the cleaner at fs-unmount time, so it can die (if we unmount 135 and then remount, we could conceivably get more than one cleaner 136 operating at once). 137 138newfs_lfs: 139 140- The ifile inode is now created with the schg flag set, since nothing 141 ever modifies it. This could be a pain for the roll-forward agent, 142 but since that should really run *before* the filesystem is mounted, 143 I don't care. 144 145- For large disks, it may be necessary to write one or more indirect 146 blocks when the ifile inode is created. Newlfs has been changed to 147 write the first indirect block, if necessary. It should instead just 148 build a set of inodes and blocks, and then use the partial-segment 149 writing routine mentioned above to write an ifile of whatever size is 150 desired. 151 152lfs_cleanerd: 153 154- Now writes information to the syslog. 155 156- Can now deal properly with fragments. 157 158- Sometimes, the cleaner can die. (Why?) If this happens and we don't 159 notice, we're screwed, since the fs will overfill. So, the invoked 160 cleaner now spawns itself repeatedly, a la init(8), to ensure that a 161 cleaner is always present to clean the fs. 162 163- Added a flag to clean more actively, not on low load average but 164 filesystem inactivity; a la Blackwell et al., 1995. 165 166fsck_lfs: 167 168- Exists, although it currently cannot actually fix anything (it is a 169 diagnostic tool only at this point). 170