1.\" 2.\" Copyright (c) 2008 3.\" The DragonFly Project. All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in 13.\" the documentation and/or other materials provided with the 14.\" distribution. 15.\" 3. Neither the name of The DragonFly Project nor the names of its 16.\" contributors may be used to endorse or promote products derived 17.\" from this software without specific, prior written permission. 18.\" 19.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 22.\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 23.\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 24.\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING, 25.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 26.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 27.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT 29.\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 30.\" SUCH DAMAGE. 31.\" 32.Dd April 8, 2010 33.Os 34.Dt HAMMER 5 35.Sh NAME 36.Nm HAMMER 37.Nd HAMMER file system 38.Sh SYNOPSIS 39To compile this driver into the kernel, 40place the following line in your 41kernel configuration file: 42.Bd -ragged -offset indent 43.Cd options HAMMER 44.Ed 45.Pp 46Alternatively, to load the driver as a 47module at boot time, place the following line in 48.Xr loader.conf 5 : 49.Bd -literal -offset indent 50hammer_load="YES" 51.Ed 52.Pp 53To mount via 54.Xr fstab 5 : 55.Bd -literal -offset indent 56/dev/ad0s1d[:/dev/ad1s1d:...] /mnt hammer rw 2 0 57.Ed 58.Sh DESCRIPTION 59The 60.Nm 61file system provides facilities to store file system data onto disk devices 62and is intended to replace 63.Xr ffs 5 64as the default file system for 65.Dx . 66Among its features are instant crash recovery, 67large file systems spanning multiple volumes, 68data integrity checking, 69fine grained history retention, 70mirroring capability, and pseudo file systems. 71.Pp 72All functions related to managing 73.Nm 74file systems are provided by the 75.Xr newfs_hammer 8 , 76.Xr mount_hammer 8 , 77.Xr hammer 8 , 78.Xr chflags 1 , 79and 80.Xr undo 1 81utilities. 82.Pp 83For a more detailed introduction refer to the paper and slides listed in the 84.Sx SEE ALSO 85section. 86For some common usages of 87.Nm 88see the 89.Sx EXAMPLES 90section below. 91.Ss Instant Crash Recovery 92After a non-graceful system shutdown, 93.Nm 94file systems will be brought back into a fully coherent state 95when mounting the file system, usually within a few seconds. 96.Ss Large File Systems & Multi Volume 97A 98.Nm 99file system can be up to 1 Exabyte in size. 100It can span up to 256 volumes, 101each volume occupies a 102.Dx 103disk slice or partition, or another special file, 104and can be up to 4096 TB in size. 105Minimum recommended 106.Nm 107file system size is 50 GB. 108For volumes over 2 TB in size 109.Xr gpt 8 110and 111.Xr disklabel64 8 112normally need to be used. 113.Ss Data Integrity Checking 114.Nm 115has high focus on data integrity, 116CRC checks are made for all major structures and data. 117.Nm 118snapshots implements features to make data integrity checking easier: 119The atime and mtime fields are locked to the ctime 120for files accessed via a snapshot. 121The 122.Fa st_dev 123field is based on the PFS 124.Ar shared-uuid 125and not on any real device. 126This means that archiving the contents of a snapshot with e.g.\& 127.Xr tar 1 128and piping it to something like 129.Xr md5 1 130will yield a consistent result. 131The consistency is also retained on mirroring targets. 132.Ss Transaction IDs 133The 134.Nm 135file system uses 64 bit, hexadecimal transaction IDs to refer to historical 136file or directory data. 137An ID has the 138.Xr printf 3 139format 140.Li %#016llx , 141such as 142.Li 0x00000001061a8ba6 . 143.Pp 144Related 145.Xr hammer 8 146commands: 147.Ar snapshot , 148.Ar synctid 149.Ss History & Snapshots 150History metadata on the media is written with every sync operation, so that 151by default the resolution of a file's history is 30-60 seconds until the next 152prune operation. 153Prior versions of files or directories are generally accessible by appending 154.Li @@ 155and a transaction ID to the name. 156The common way of accessing history, however, is by taking snapshots. 157.Pp 158Snapshots are softlinks to prior versions of directories and their files. 159Their data will be retained across prune operations for as long as the 160softlink exists. 161Removing the softlink enables the file system to reclaim the space 162again upon the next prune & reblock operations. 163.Pp 164Related 165.Xr hammer 8 166commands: 167.Ar cleanup , 168.Ar history , 169.Ar snapshot ; 170see also 171.Xr undo 1 172.Ss Pruning & Reblocking 173Pruning is the act of deleting file system history. 174By default only history used by the given snapshots 175and history from after the latest snapshot will be retained. 176By setting the per PFS parameter 177.Cm prune-min , 178history is guaranteed to be saved at least this time interval. 179All other history is deleted. 180Reblocking will reorder all elements and thus defragment the file system and 181free space for reuse. 182After pruning a file system must be reblocked to recover all available space. 183Reblocking is needed even when using the 184.Ar nohistory 185.Xr mount_hammer 8 186option or 187.Xr chflags 1 188flag. 189.Pp 190Related 191.Xr hammer 8 192commands: 193.Ar cleanup , 194.Ar snapshot , 195.Ar prune , 196.Ar prune-everything , 197.Ar rebalance , 198.Ar reblock , 199.Ar reblock-btree , 200.Ar reblock-inodes , 201.Ar reblock-dirs , 202.Ar reblock-data 203.Ss Mirroring & Pseudo File Systems 204In order to allow inode numbers to be duplicated on the slaves 205.Nm Ap s 206mirroring feature uses 207.Dq Pseudo File Systems 208(PFSs). 209A 210.Nm 211file system supports up to 65535 PFSs. 212Multiple slaves per master are supported, but multiple masters per slave 213are not. 214Slaves are always read-only. 215Upgrading slaves to masters and downgrading masters to slaves are supported. 216.Pp 217It is recommended to use a 218.Nm null 219mount to access a PFS; 220this way no tools are confused by the PFS root being a symlink 221and inodes not being unique across a 222.Nm 223file system. 224.Pp 225Related 226.Xr hammer 8 227commands: 228.Ar pfs-master , 229.Ar pfs-slave , 230.Ar pfs-cleanup , 231.Ar pfs-status , 232.Ar pfs-update , 233.Ar pfs-destroy , 234.Ar pfs-upgrade , 235.Ar pfs-downgrade , 236.Ar mirror-copy , 237.Ar mirror-stream , 238.Ar mirror-read , 239.Ar mirror-read-stream , 240.Ar mirror-write , 241.Ar mirror-dump 242.Ss NFS Export 243.Nm 244file systems support NFS export. 245NFS export of PFSs is done using 246.Nm null 247mounts. 248For example, to export the PFS 249.Pa /hammer/pfs/data , 250create a 251.Nm null 252mount, e.g.\& to 253.Pa /hammer/data 254and export the latter path. 255.Pp 256Don't export a directory containing a PFS (e.g.\& 257.Pa /hammer/pfs 258above). 259Only 260.Nm null 261mount for PFS root 262(e.g.\& 263.Pa /hammer/data 264above) 265should be exported 266(subdirectory may be escaped if exported). 267.Sh EXAMPLES 268.Ss Preparing the File System 269To create and mount a 270.Nm 271file system use the 272.Xr newfs_hammer 8 273and 274.Xr mount_hammer 8 275commands. 276Note that all 277.Nm 278file systems must have a unique name on a per-machine basis. 279.Bd -literal -offset indent 280newfs_hammer -L HOME /dev/ad0s1d 281mount_hammer /dev/ad0s1d /home 282.Ed 283.Pp 284Similarly, multi volume file systems can be created and mounted by 285specifying additional arguments. 286.Bd -literal -offset indent 287newfs_hammer -L MULTIHOME /dev/ad0s1d /dev/ad1s1d 288mount_hammer /dev/ad0s1d /dev/ad1s1d /home 289.Ed 290.Pp 291Once created and mounted, 292.Nm 293file systems need periodic clean up making snapshots, pruning and reblocking, 294in order to have access to history and file system not to fill up. 295For this it is recommended to use the 296.Xr hammer 8 297.Ar cleanup 298metacommand. 299.Pp 300By default, 301.Dx 302is set up to run 303.Nm hammer Ar cleanup 304nightly via 305.Xr periodic 8 . 306.Pp 307It is also possible to perform these operations individually via 308.Xr crontab 5 . 309For example, to reblock the 310.Pa /home 311file system every night at 2:15 for up to 5 minutes: 312.Bd -literal -offset indent 31315 2 * * * hammer -c /var/run/HOME.reblock -t 300 reblock /home \e 314 >/dev/null 2>&1 315.Ed 316.Ss Snapshots 317The 318.Xr hammer 8 319utility's 320.Ar snapshot 321command provides several ways of taking snapshots. 322They all assume a directory where snapshots are kept. 323.Bd -literal -offset indent 324mkdir /snaps 325hammer snapshot /home /snaps/snap1 326(...after some changes in /home...) 327hammer snapshot /home /snaps/snap2 328.Ed 329.Pp 330The softlinks in 331.Pa /snaps 332point to the state of the 333.Pa /home 334directory at the time each snapshot was taken, and could now be used to copy 335the data somewhere else for backup purposes. 336.Pp 337By default, 338.Dx 339is set up to create nightly snapshots of all 340.Nm 341file systems via 342.Xr periodic 8 343and to keep them for 60 days. 344.Ss Pruning 345A snapshot directory is also the argument to the 346.Xr hammer 8 Ap s 347.Ar prune 348command which frees historical data from the file system that is not 349pointed to by any snapshot link and is not from after the latest snapshot. 350.Bd -literal -offset indent 351rm /snaps/snap1 352hammer prune /snaps 353.Ed 354.Ss Mirroring 355Mirroring can be set up using 356.Nm Ap s 357pseudo file systems. 358To associate the slave with the master its shared UUID should be set to 359the master's shared UUID as output by the 360.Nm hammer Ar pfs-master 361command. 362.Bd -literal -offset indent 363hammer pfs-master /home/pfs/master 364hammer pfs-slave /home/pfs/slave shared-uuid=<master's shared uuid> 365.Ed 366.Pp 367The 368.Pa /home/pfs/slave 369link is unusable for as long as no mirroring operation has taken place. 370.Pp 371To mirror the master's data, either pipe a 372.Fa mirror-read 373command into a 374.Fa mirror-write 375or, as a short-cut, use the 376.Fa mirror-copy 377command (which works across a 378.Xr ssh 1 379connection as well). 380Initial mirroring operation has to be done to the PFS path (as 381.Xr mount_null 8 382can't access it yet). 383.Bd -literal -offset indent 384hammer mirror-copy /home/pfs/master /home/pfs/slave 385.Ed 386.Pp 387After this initial step 388.Nm null 389mount can be setup for 390.Pa /home/pfs/slave . 391Further operations can use 392.Nm null 393mounts. 394.Bd -literal -offset indent 395mount_null /home/pfs/master /home/master 396mount_null /home/pfs/slave /home/slave 397 398hammer mirror-copy /home/master /home/slave 399.Ed 400.Ss NFS Export 401To NFS export from the 402.Nm 403file system 404.Pa /hammer 405the directory 406.Pa /hammer/non-pfs 407without PFSs, and the PFS 408.Pa /hammer/pfs/data , 409the latter is null mounted to 410.Pa /hammer/data . 411.Pp 412Add to 413.Pa /etc/fstab 414(see 415.Xr fstab 5 ) : 416.Bd -literal -offset indent 417/hammer/pfs/data /hammer/data null rw 418.Ed 419.Pp 420Add to 421.Pa /etc/exports 422(see 423.Xr exports 5 ) : 424.Bd -literal -offset indent 425/hammer/non-pfs 426/hammer/data 427.Ed 428.Sh SEE ALSO 429.Xr chflags 1 , 430.Xr md5 1 , 431.Xr tar 1 , 432.Xr undo 1 , 433.Xr exports 5 , 434.Xr ffs 5 , 435.Xr fstab 5 , 436.Xr disklabel64 8 , 437.Xr gpt 8 , 438.Xr hammer 8 , 439.Xr mount_hammer 8 , 440.Xr mount_null 8 , 441.Xr newfs_hammer 8 442.Rs 443.%A Matthew Dillon 444.%D June 2008 445.%O http://www.dragonflybsd.org/hammer/hammer.pdf 446.%T "The HAMMER Filesystem" 447.Re 448.Rs 449.%A Matthew Dillon 450.%D October 2008 451.%O http://www.dragonflybsd.org/hammer/nycbsdcon/ 452.%T "Slideshow from NYCBSDCon 2008" 453.Re 454.Rs 455.%A Michael Neumann 456.%D January 2010 457.%O http://www.ntecs.de/sysarch09/HAMMER.pdf 458.%T "Slideshow for a presentation held at KIT (http://www.kit.edu)." 459.Re 460.Sh FILESYSTEM PERFORMANCE 461The 462.Nm 463file system has a front-end which processes VNOPS and issues necessary 464block reads from disk, and a back-end which handles meta-data updates 465on-media and performs all meta-data write operations. 466Bulk file write operations are handled by the front-end. 467Because 468.Nm 469defers meta-data updates virtually no meta-data read operations will be 470issued by the frontend while writing large amounts of data to the file system 471or even when creating new files or directories, and even though the 472kernel prioritizes reads over writes the fact that writes are cached by 473the drive itself tends to lead to excessive priority given to writes. 474.Pp 475There are four bioq sysctls, shown below with default values, 476which can be adjusted to give reads a higher priority: 477.Bd -literal -offset indent 478kern.bioq_reorder_minor_bytes: 262144 479kern.bioq_reorder_burst_bytes: 3000000 480kern.bioq_reorder_minor_interval: 5 481kern.bioq_reorder_burst_interval: 60 482.Ed 483.Pp 484If a higher read priority is desired it is recommended that the 485.Fa kern.bioq_reorder_minor_interval 486be increased to 15, 30, or even 60, and the 487.Fa kern.bioq_reorder_burst_bytes 488be decreased to 262144 or 524288. 489.Sh HISTORY 490The 491.Nm 492file system first appeared in 493.Dx 1.11 . 494.Sh AUTHORS 495.An -nosplit 496The 497.Nm 498file system was designed and implemented by 499.An Matthew Dillon Aq dillon@backplane.com . 500This manual page was written by 501.An Sascha Wildner . 502