1.\" 2.\" Copyright (c) 2008 3.\" The DragonFly Project. All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in 13.\" the documentation and/or other materials provided with the 14.\" distribution. 15.\" 3. Neither the name of The DragonFly Project nor the names of its 16.\" contributors may be used to endorse or promote products derived 17.\" from this software without specific, prior written permission. 18.\" 19.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 22.\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 23.\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 24.\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING, 25.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 26.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 27.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT 29.\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 30.\" SUCH DAMAGE. 31.\" 32.\" $DragonFly: src/share/man/man5/hammer.5,v 1.15 2008/11/02 18:56:47 swildner Exp $ 33.\" 34.Dd November 2, 2008 35.Os 36.Dt HAMMER 5 37.Sh NAME 38.Nm HAMMER 39.Nd HAMMER file system 40.Sh SYNOPSIS 41To compile this driver into the kernel, 42place the following line in your 43kernel configuration file: 44.Bd -ragged -offset indent 45.Cd options HAMMER 46.Ed 47.Pp 48Alternatively, to load the driver as a 49module at boot time, place the following line in 50.Xr loader.conf 5 : 51.Bd -literal -offset indent 52hammer_load="YES" 53.Ed 54.Pp 55To mount via 56.Xr fstab 5 : 57.Bd -literal -offset indent 58/dev/ad0s1d[:/dev/ad1s1d:...] /mnt hammer rw 2 0 59.Ed 60.Sh DESCRIPTION 61The 62.Nm 63file system provides facilities to store file system data onto disk devices 64and is intended to replace 65.Xr ffs 5 66as the default file system for 67.Dx . 68Among its features are instant crash recovery, 69large file systems spanning multiple volumes, 70data integrity checking, 71fine grained history retention, 72mirroring capability, and pseudo file systems. 73.Pp 74All functions related to managing 75.Nm 76file systems are provided by the 77.Xr newfs_hammer 8 , 78.Xr mount_hammer 8 , 79.Xr hammer 8 , 80and 81.Xr undo 1 82utilities. 83.Pp 84For a more detailed introduction refer to the paper and slides listed in the 85.Sx SEE ALSO 86section. 87For some common usages of 88.Nm 89see the 90.Sx EXAMPLES 91section below. 92.Ss Instant Crash Recovery 93After a non-graceful system shutdown, 94.Nm 95file systems will be brought back into a fully coherent state 96when mounting the file system, usually within a few seconds. 97.Ss Large File Systems & Multi Volume 98A 99.Nm 100file system can span up to 256 volumes. 101Each volume occupies a 102.Dx 103disk slice or partition, or another special file, 104and can be up to 4096 TB in size. 105For volumes over 2 TB in size 106.Xr gpt 8 107and 108.Xr disklabel64 8 109normally need to be used. 110.Ss Data Integrity Checking 111.Nm 112has high focus on data integrity, 113CRC checks are made for all major structures and data. 114.Nm 115snapshots implements features to make data integrity checking easier: 116The atime and mtime fields are locked to the ctime for files accessed via a snapshot. 117The 118.Fa st_dev 119field is based on the PFS 120.Ar shared-uuid 121and not on any real device. 122This means that archiving the contents of a snaphot with e.g.\& 123.Xr tar 1 124and piping it to something like 125.Xr md5 1 126will yield a consistent result. 127The consistency is also retained on mirroring targets. 128.Ss Transaction IDs 129The 130.Nm 131file system uses 64 bit, hexadecimal transaction IDs to refer to historical 132file or directory data. 133An ID has the 134.Xr printf 3 135format 136.Li %#016llx , 137such as 138.Li 0x00000001061a8ba6 . 139.Pp 140Related 141.Xr hammer 8 142commands: 143.Ar synctid 144.Ss History & Snapshots 145History metadata on the media is written with every sync operation, so that 146by default the resolution of a file's history is 30-60 seconds until the next 147prune operation. 148Prior versions of files or directories are generally accessible by appending 149.Li @@ 150and a transaction ID to the name. 151The common way of accessing history, however, is by taking snapshots. 152.Pp 153Snapshots are softlinks to prior versions of directories and their files. 154Their data will be retained across prune operations for as long as the 155softlink exists. 156Removing the softlink enables the file system to reclaim the space 157again upon the next prune & reblock operations. 158.Pp 159Related 160.Xr hammer 8 161commands: 162.Ar cleanup , 163.Ar history , 164.Ar snapshot ; 165see also 166.Xr undo 1 167.Ss Pruning & Reblocking 168Pruning is the act of deleting file system history. 169Only history used by the given snapshots and history from after the latest 170snapshot will be retained. 171All other history is deleted. 172Reblocking will reorder all elements and thus defragment the file system and 173free space for reuse. 174After pruning a file system must be reblocked to recover all available space. 175Reblocking is needed even when using the 176.Ar nohistory 177.Xr mount_hammer 8 178option. 179.Pp 180Related 181.Xr hammer 8 182commands: 183.Ar cleanup , 184.Ar prune , 185.Ar prune-everything , 186.Ar reblock , 187.Ar reblock-btree , 188.Ar reblock-inodes , 189.Ar reblock-dirs , 190.Ar reblock-data 191.Ss Mirroring & Pseudo File Systems 192In order to allow inode numbers to be duplicated on the slaves 193.Nm Ap s 194mirroring feature uses 195.Dq Pseudo File Systems 196(PFSs). 197A 198.Nm 199file system supports up to 65535 PFSs. 200Multiple slaves per master are supported, but multiple masters per slave 201are not. 202Slaves are always read-only. 203Upgrading slaves to masters and downgrading masters to slaves are supported. 204.Pp 205It is recommended to use a 206.Nm null 207mount to access a PFS; 208this way no tools are confused by the PFS root being a symlink 209and inodes not being unique across a 210.Nm 211file system. 212.Pp 213Related 214.Xr hammer 8 215commands: 216.Ar pfs-master , 217.Ar pfs-slave , 218.Ar pfs-cleanup , 219.Ar pfs-status , 220.Ar pfs-update , 221.Ar pfs-destroy , 222.Ar pfs-upgrade , 223.Ar pfs-downgrade , 224.Ar mirror-copy , 225.Ar mirror-stream , 226.Ar mirror-read , 227.Ar mirror-read-stream , 228.Ar mirror-write , 229.Ar mirror-dump 230.Ss NFS Export 231.Nm 232file systems support NFS export. 233NFS export of PFSs is done using 234.Nm null 235mounts. 236For example, to export the PFS 237.Pa /hammer/pfs/data , 238create a 239.Nm null 240mount, e.g.\& to 241.Pa /hammer/data 242and export the latter path. 243.Pp 244Don't export a directory containing a PFS (e.g.\& 245.Pa /hammer/pfs 246above). 247Only 248.Nm null 249mount for PFS root 250(e.g.\& 251.Pa /hammer/data 252above) 253should be exported 254(subdirectory may be escaped if exported). 255.Sh EXAMPLES 256.Ss Preparing the File System 257To create and mount a 258.Nm 259file system use the 260.Xr newfs_hammer 8 261and 262.Xr mount_hammer 8 263commands. 264Note that all 265.Nm 266file systems must have a unique name on a per-machine basis. 267.Bd -literal -offset indent 268newfs_hammer -L HOME /dev/ad0s1d 269mount_hammer /dev/ad0s1d /home 270.Ed 271.Pp 272Similarly, multi volume file systems can be created and mounted by 273specifying additional arguments. 274.Bd -literal -offset indent 275newfs_hammer -L MULTIHOME /dev/ad0s1d /dev/ad1s1d 276mount_hammer /dev/ad0s1d /dev/ad1s1d /home 277.Ed 278.Pp 279Once created and mounted, 280.Nm 281file systems need periodic clean up making snapshots, pruning and reblocking, 282in order to have access to history and file system not to fill up. 283For this it is recommended to use the 284.Xr hammer 8 285.Ar cleanup 286metacommand. 287.Pp 288By default, 289.Dx 290is set up to run 291.Nm hammer Ar cleanup 292nightly via 293.Xr periodic 8 . 294.Pp 295It is also possible to perform these operations individually via 296.Xr crontab 5 . 297For example, to reblock the 298.Pa /home 299file system every night at 2:15 for up to 5 minutes: 300.Bd -literal -offset indent 30115 2 * * * hammer -c /var/run/HOME.reblock -t 300 reblock /home \e 302 >/dev/null 2>&1 303.Ed 304.Ss Snapshots 305The 306.Xr hammer 8 307utility's 308.Ar snapshot 309command provides several ways of taking snapshots. 310They all assume a directory where snapshots are kept. 311.Bd -literal -offset indent 312mkdir /snaps 313hammer snapshot /home /snaps/snap1 314(...after some changes in /home...) 315hammer snapshot /home /snaps/snap2 316.Ed 317.Pp 318The softlinks in 319.Pa /snaps 320point to the state of the 321.Pa /home 322directory at the time each snapshot was taken, and could now be used to copy 323the data somewhere else for backup purposes. 324.Pp 325By default, 326.Dx 327is set up to create nightly snapshots of all 328.Nm 329file systems via 330.Xr periodic 8 331and to keep them for 60 days. 332.Ss Pruning 333A snapshot directory is also the argument to the 334.Xr hammer 8 Ap s 335.Ar prune 336command which frees historical data from the file system that is not 337pointed to by any snapshot link and is not from after the latest snapshot. 338.Bd -literal -offset indent 339rm /snaps/snap1 340hammer prune /snaps 341.Ed 342.Ss Mirroring 343Mirroring can be set up using 344.Nm Ap s 345pseudo file systems. 346To associate the slave with the master its shared UUID should be set to 347the master's shared UUID as output by the 348.Nm hammer Ar pfs-master 349command. 350.Bd -literal -offset indent 351hammer pfs-master /home/pfs/master 352hammer pfs-slave /home/pfs/slave shared-uuid=<master's shared uuid> 353.Ed 354.Pp 355The 356.Pa /home/pfs/slave 357link is unusable for as long as no mirroring operation has taken place. 358.Pp 359To mirror the master's data, either pipe a 360.Fa mirror-read 361command into a 362.Fa mirror-write 363or, as a short-cut, use the 364.Fa mirror-copy 365command (which works across a 366.Xr ssh 1 367connection as well). 368Initial mirroring operation has to be done to the PFS path (as 369.Xr mount_null 8 370can't access it yet). 371.Bd -literal -offset indent 372hammer mirror-copy /home/pfs/master /home/pfs/slave 373.Ed 374.Pp 375After this initial step 376.Nm null 377mount can be setup for 378.Pa /home/pfs/slave . 379Further operations can use 380.Nm null 381mounts. 382.Bd -literal -offset indent 383mount_null /home/pfs/master /home/master 384mount_null /home/pfs/slave /home/slave 385 386hammer mirror-copy /home/master /home/slave 387.Ed 388.Ss NFS Export 389To NFS export from the 390.Nm 391file system 392.Pa /hammer 393the directory 394.Pa /hammer/non-pfs 395without PFSs, and the PFS 396.Pa /hammer/pfs/data , 397the latter is null mounted to 398.Pa /hammer/data . 399.Pp 400Add to 401.Pa /etc/fstab 402(see 403.Xr fstab 5 ) : 404.Bd -literal -offset indent 405/hammer/pfs/data /hammer/data null rw 406.Ed 407.Pp 408Add to 409.Pa /etc/exports 410(see 411.Xr exports 5 ) : 412.Bd -literal -offset indent 413/hammer/non-pfs 414/hammer/data 415.Ed 416.Sh SEE ALSO 417.Xr md5 1 , 418.Xr tar 1 , 419.Xr undo 1 , 420.Xr ffs 5 , 421.Xr disklabel64 8 , 422.Xr gpt 8 , 423.Xr hammer 8 , 424.Xr mount_hammer 8 , 425.Xr mount_null 8 , 426.Xr newfs_hammer 8 427.Rs 428.%A Matthew Dillon 429.%D June 2008 430.%O http://www.dragonflybsd.org/hammer/hammer.pdf 431.%T "The HAMMER Filesystem" 432.Re 433.Rs 434.%A Matthew Dillon 435.%D October 2008 436.%O http://www.dragonflybsd.org/hammer/nycbsdcon/ 437.%T "Slideshow from NYCBSDCon 2008" 438.Re 439.Sh FILESYSTEM PERFORMANCE 440The 441.Nm 442file system has a front-end which processes VNOPS and issues necessary 443block reads from disk, and a back-end which handles meta-data updates 444on-media and performs all meta-data write operations. Bulk file write 445operations are handled by the front-end. 446Because 447.Nm 448defers meta-data updates virtually no meta-data read operations will be 449issued by the frontend while writing large amounts of data to the filesystem 450or even when creating new files or directories, and even though the 451kernel prioritizes reads over writes the fact that writes are cached by 452the drive itself tends to lead to excessive priority given to writes. 453.Pp 454There are four bioq sysctls which can be adjusted to give reads a higher 455priority: 456.Bd -literal -offset indent 457kern.bioq_reorder_minor_bytes: 262144 458kern.bioq_reorder_burst_bytes: 3000000 459kern.bioq_reorder_minor_interval: 5 460kern.bioq_reorder_burst_interval: 60 461.Ed 462.Pp 463If a higher read priority is desired it is recommended that the 464.Fa kern.bioq_reorder_minor_interval 465be increased to 15, 30, or even 60, and the 466.Fa kern.bioq_reorder_burst_bytes 467be decreased to 262144 or 524288. 468.Sh HISTORY 469The 470.Nm 471file system first appeared in 472.Dx 1.11 . 473.Sh AUTHORS 474.An -nosplit 475The 476.Nm 477file system was designed and implemented by 478.An Matthew Dillon Aq dillon@backplane.com . 479This manual page was written by 480.An Sascha Wildner . 481