1.\" 2.\" Copyright (c) 2008 3.\" The DragonFly Project. All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in 13.\" the documentation and/or other materials provided with the 14.\" distribution. 15.\" 3. Neither the name of The DragonFly Project nor the names of its 16.\" contributors may be used to endorse or promote products derived 17.\" from this software without specific, prior written permission. 18.\" 19.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 22.\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 23.\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 24.\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING, 25.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 26.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 27.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT 29.\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 30.\" SUCH DAMAGE. 31.\" 32.Dd August 14, 2012 33.Dt HAMMER 5 34.Os 35.Sh NAME 36.Nm HAMMER 37.Nd HAMMER file system 38.Sh SYNOPSIS 39To compile this driver into the kernel, 40place the following line in your 41kernel configuration file: 42.Bd -ragged -offset indent 43.Cd "options HAMMER" 44.Ed 45.Pp 46Alternatively, to load the driver as a 47module at boot time, place the following line in 48.Xr loader.conf 5 : 49.Bd -literal -offset indent 50hammer_load="YES" 51.Ed 52.Pp 53To mount via 54.Xr fstab 5 : 55.Bd -literal -offset indent 56/dev/ad0s1d[:/dev/ad1s1d:...] /mnt hammer rw 2 0 57.Ed 58.Sh DESCRIPTION 59The 60.Nm 61file system provides facilities to store file system data onto disk devices 62and is intended to replace 63.Xr ffs 5 64as the default file system for 65.Dx . 66.Pp 67Among its features are instant crash recovery, 68large file systems spanning multiple volumes, 69data integrity checking, 70data deduplication, 71fine grained history retention and snapshots, 72pseudo-filesystems (PFSs), 73mirroring capability and 74unlimited number of files and links. 75.Pp 76All functions related to managing 77.Nm 78file systems are provided by the 79.Xr newfs_hammer 8 , 80.Xr mount_hammer 8 , 81.Xr hammer 8 , 82.Xr sysctl 8 , 83.Xr chflags 1 , 84and 85.Xr undo 1 86utilities. 87.Pp 88For a more detailed introduction refer to the paper and slides listed in the 89.Sx SEE ALSO 90section. 91For some common usages of 92.Nm 93see the 94.Sx EXAMPLES 95section below. 96.Pp 97Description of 98.Nm 99features: 100.Ss Instant Crash Recovery 101After a non-graceful system shutdown, 102.Nm 103file systems will be brought back into a fully coherent state 104when mounting the file system, usually within a few seconds. 105.Pp 106In the unlikely case 107.Nm 108mount fails due redo recovery (stage 2 recovery) being corrupted, a 109workaround to skip this stage can be applied by setting the following tunable: 110.Bd -literal -offset indent 111vfs.hammer.skip_redo=<value> 112.Ed 113.Pp 114Possible values are: 115.Bl -tag -width indent 116.It 0 117Run redo recovery normally and fail to mount in the case of error (default). 118.It 1 119Run redo recovery but continue mounting if an error appears. 120.It 2 121Completely bypass redo recovery. 122.El 123.Pp 124Related commands: 125.Xr mount_hammer 8 126.Ss Large File Systems & Multi Volume 127A 128.Nm 129file system can be up to 1 Exabyte in size. 130It can span up to 256 volumes, 131each volume occupies a 132.Dx 133disk slice or partition, or another special file, 134and can be up to 4096 TB in size. 135Minimum recommended 136.Nm 137file system size is 50 GB. 138For volumes over 2 TB in size 139.Xr gpt 8 140and 141.Xr disklabel64 8 142normally need to be used. 143.Pp 144Related 145.Xr hammer 8 146commands: 147.Cm volume-add , 148.Cm volume-del , 149.Cm volume-list ; 150see also 151.Xr newfs_hammer 8 152.Ss Data Integrity Checking 153.Nm 154has high focus on data integrity, 155CRC checks are made for all major structures and data. 156.Nm 157snapshots implements features to make data integrity checking easier: 158The atime and mtime fields are locked to the ctime 159for files accessed via a snapshot. 160The 161.Fa st_dev 162field is based on the PFS 163.Ar shared-uuid 164and not on any real device. 165This means that archiving the contents of a snapshot with e.g.\& 166.Xr tar 1 167and piping it to something like 168.Xr md5 1 169will yield a consistent result. 170The consistency is also retained on mirroring targets. 171.Ss Data Deduplication 172To save disk space data deduplication can be used. 173Data deduplication will identify data blocks which occur multiple times 174and only store one copy, multiple reference will be made to this copy. 175.Pp 176Related 177.Xr hammer 8 178commands: 179.Cm dedup , 180.Cm dedup-simulate , 181.Cm cleanup , 182.Cm config 183.Ss Transaction IDs 184The 185.Nm 186file system uses 64-bit transaction ids to refer to historical 187file or directory data. 188Transaction ids used by 189.Nm 190are monotonically increasing over time. 191In other words: 192when a transaction is made, 193.Nm 194will always use higher transaction ids for following transactions. 195A transaction id is given in hexadecimal format 196.Li 0x016llx , 197such as 198.Li 0x00000001061a8ba6 . 199.Pp 200Related 201.Xr hammer 8 202commands: 203.Cm snapshot , 204.Cm snap , 205.Cm snaplo , 206.Cm snapq , 207.Cm snapls , 208.Cm synctid 209.Ss History & Snapshots 210History metadata on the media is written with every sync operation, so that 211by default the resolution of a file's history is 30-60 seconds until the next 212prune operation. 213Prior versions of files and directories are generally accessible by appending 214.Ql @@ 215and a transaction id to the name. 216The common way of accessing history, however, is by taking snapshots. 217.Pp 218Snapshots are softlinks to prior versions of directories and their files. 219Their data will be retained across prune operations for as long as the 220softlink exists. 221Removing the softlink enables the file system to reclaim the space 222again upon the next prune & reblock operations. 223In 224.Nm 225Version 3+ snapshots are also maintained as file system meta-data. 226.Pp 227Related 228.Xr hammer 8 229commands: 230.Cm cleanup , 231.Cm history , 232.Cm snapshot , 233.Cm snap , 234.Cm snaplo , 235.Cm snapq , 236.Cm snaprm , 237.Cm snapls , 238.Cm config , 239.Cm viconfig ; 240see also 241.Xr undo 1 242.Ss Pruning & Reblocking 243Pruning is the act of deleting file system history. 244By default only history used by the given snapshots 245and history from after the latest snapshot will be retained. 246By setting the per PFS parameter 247.Cm prune-min , 248history is guaranteed to be saved at least this time interval. 249All other history is deleted. 250Reblocking will reorder all elements and thus defragment the file system and 251free space for reuse. 252After pruning a file system must be reblocked to recover all available space. 253Reblocking is needed even when using the 254.Cm nohistory 255.Xr mount_hammer 8 256option or 257.Xr chflags 1 258flag. 259.Pp 260Related 261.Xr hammer 8 262commands: 263.Cm cleanup , 264.Cm snapshot , 265.Cm prune , 266.Cm prune-everything , 267.Cm rebalance , 268.Cm reblock , 269.Cm reblock-btree , 270.Cm reblock-inodes , 271.Cm reblock-dirs , 272.Cm reblock-data 273.Ss Pseudo-Filesystems (PFSs) 274A pseudo-filesystem, PFS for short, is a sub file system in a 275.Nm 276file system. 277Each PFS has independent inode numbers. 278All disk space in a 279.Nm 280file system is shared between all PFSs in it, 281so each PFS is free to use all remaining space. 282A 283.Nm 284file system supports up to 65536 PFSs. 285The root of a 286.Nm 287file system is PFS# 0, it is called the root PFS and is always a master PFS. 288.Pp 289A PFS can be either master or slave. 290Slaves are always read-only, 291so they can't be updated by normal file operations, only by 292.Xr hammer 8 293operations like mirroring and pruning. 294Upgrading slaves to masters and downgrading masters to slaves are supported. 295.Pp 296It is recommended to use a 297.Nm null 298mount to access a PFS, except for root PFS; 299this way no tools are confused by the PFS root being a symlink 300and inodes not being unique across a 301.Nm 302file system. 303.Pp 304Many 305.Xr hammer 8 306operations operates per PFS, 307this includes mirroring, offline deduping, pruning, reblocking and rebalancing. 308.Pp 309Related 310.Xr hammer 8 311commands: 312.Cm pfs-master , 313.Cm pfs-slave , 314.Cm pfs-status , 315.Cm pfs-update , 316.Cm pfs-destroy , 317.Cm pfs-upgrade , 318.Cm pfs-downgrade ; 319see also 320.Xr mount_null 8 321.Ss Mirroring 322Mirroring is copying of all data in a file system, including snapshots 323and other historical data. 324In order to allow inode numbers to be duplicated on the slaves 325.Nm 326mirroring feature uses PFSs. 327A master or slave PFS can be mirrored to a slave PFS. 328I.e.\& for mirroring multiple slaves per master are supported, 329but multiple masters per slave are not. 330.Pp 331Related 332.Xr hammer 8 333commands: 334.Cm mirror-copy , 335.Cm mirror-stream , 336.Cm mirror-read , 337.Cm mirror-read-stream , 338.Cm mirror-write , 339.Cm mirror-dump 340.Ss Fsync Flush Modes 341The 342.Nm 343file system implements several different 344.Fn fsync 345flush modes, the mode used is set via the 346.Va vfs.hammer.flush_mode 347sysctl, see 348.Xr hammer 8 349for details. 350.Ss Unlimited Number of Files and Links 351There is no limit on the number of files or links in a 352.Nm 353file system, apart from available disk space. 354.Ss NFS Export 355.Nm 356file systems support NFS export. 357NFS export of PFSs is done using 358.Nm null 359mounts (for file/directory in root PFS 360.Nm null 361mount is not needed). 362For example, to export the PFS 363.Pa /hammer/pfs/data , 364create a 365.Nm null 366mount, e.g.\& to 367.Pa /hammer/data 368and export the latter path. 369.Pp 370Don't export a directory containing a PFS (e.g.\& 371.Pa /hammer/pfs 372above). 373Only 374.Nm null 375mount for PFS root 376(e.g.\& 377.Pa /hammer/data 378above) should be exported (subdirectory may be escaped if exported). 379.Ss File System Versions 380As new features have been introduced to 381.Nm 382a version number has been bumped. 383Each 384.Nm 385file system has a version, which can be upgraded to support new features. 386.Pp 387Related 388.Xr hammer 8 389commands: 390.Cm version , 391.Cm version-upgrade ; 392see also 393.Xr newfs_hammer 8 394.Sh EXAMPLES 395.Ss Preparing the File System 396To create and mount a 397.Nm 398file system use the 399.Xr newfs_hammer 8 400and 401.Xr mount_hammer 8 402commands. 403Note that all 404.Nm 405file systems must have a unique name on a per-machine basis. 406.Bd -literal -offset indent 407newfs_hammer -L HOME /dev/ad0s1d 408mount_hammer /dev/ad0s1d /home 409.Ed 410.Pp 411Similarly, multi volume file systems can be created and mounted by 412specifying additional arguments. 413.Bd -literal -offset indent 414newfs_hammer -L MULTIHOME /dev/ad0s1d /dev/ad1s1d 415mount_hammer /dev/ad0s1d /dev/ad1s1d /home 416.Ed 417.Pp 418Once created and mounted, 419.Nm 420file systems need periodic clean up making snapshots, pruning and reblocking, 421in order to have access to history and file system not to fill up. 422For this it is recommended to use the 423.Xr hammer 8 424.Cm cleanup 425metacommand. 426.Pp 427By default, 428.Dx 429is set up to run 430.Nm hammer Cm cleanup 431nightly via 432.Xr periodic 8 . 433.Pp 434It is also possible to perform these operations individually via 435.Xr crontab 5 . 436For example, to reblock the 437.Pa /home 438file system every night at 2:15 for up to 5 minutes: 439.Bd -literal -offset indent 44015 2 * * * hammer -c /var/run/HOME.reblock -t 300 reblock /home \e 441 >/dev/null 2>&1 442.Ed 443.Ss Snapshots 444The 445.Xr hammer 8 446utility's 447.Cm snapshot 448command provides several ways of taking snapshots. 449They all assume a directory where snapshots are kept. 450.Bd -literal -offset indent 451mkdir /snaps 452hammer snapshot /home /snaps/snap1 453(...after some changes in /home...) 454hammer snapshot /home /snaps/snap2 455.Ed 456.Pp 457The softlinks in 458.Pa /snaps 459point to the state of the 460.Pa /home 461directory at the time each snapshot was taken, and could now be used to copy 462the data somewhere else for backup purposes. 463.Pp 464By default, 465.Dx 466is set up to create nightly snapshots of all 467.Nm 468file systems via 469.Xr periodic 8 470and to keep them for 60 days. 471.Ss Pruning 472A snapshot directory is also the argument to the 473.Xr hammer 8 474.Cm prune 475command which frees historical data from the file system that is not 476pointed to by any snapshot link and is not from after the latest snapshot 477and is older than 478.Cm prune-min . 479.Bd -literal -offset indent 480rm /snaps/snap1 481hammer prune /snaps 482.Ed 483.Ss Mirroring 484Mirroring is set up using 485.Nm 486pseudo-filesystems (PFSs). 487To associate the slave with the master its shared UUID should be set to 488the master's shared UUID as output by the 489.Nm hammer Cm pfs-master 490command. 491.Bd -literal -offset indent 492hammer pfs-master /home/pfs/master 493hammer pfs-slave /home/pfs/slave shared-uuid=<master's shared uuid> 494.Ed 495.Pp 496The 497.Pa /home/pfs/slave 498link is unusable for as long as no mirroring operation has taken place. 499.Pp 500To mirror the master's data, either pipe a 501.Cm mirror-read 502command into a 503.Cm mirror-write 504or, as a short-cut, use the 505.Cm mirror-copy 506command (which works across a 507.Xr ssh 1 508connection as well). 509Initial mirroring operation has to be done to the PFS path (as 510.Xr mount_null 8 511can't access it yet). 512.Bd -literal -offset indent 513hammer mirror-copy /home/pfs/master /home/pfs/slave 514.Ed 515.Pp 516It is also possible to have the target PFS auto created 517by just issuing the same 518.Cm mirror-copy 519command, if the target PFS doesn't exist you will be prompted 520if you would like to create it. 521You can even omit the prompting by using the 522.Fl y 523flag: 524.Bd -literal -offset indent 525hammer -y mirror-copy /home/pfs/master /home/pfs/slave 526.Ed 527.Pp 528After this initial step 529.Nm null 530mount can be setup for 531.Pa /home/pfs/slave . 532Further operations can use 533.Nm null 534mounts. 535.Bd -literal -offset indent 536mount_null /home/pfs/master /home/master 537mount_null /home/pfs/slave /home/slave 538 539hammer mirror-copy /home/master /home/slave 540.Ed 541.Ss NFS Export 542To NFS export from the 543.Nm 544file system 545.Pa /hammer 546the directory 547.Pa /hammer/non-pfs 548without PFSs, and the PFS 549.Pa /hammer/pfs/data , 550the latter is 551.Nm null 552mounted to 553.Pa /hammer/data . 554.Pp 555Add to 556.Pa /etc/fstab 557(see 558.Xr fstab 5 ) : 559.Bd -literal -offset indent 560/hammer/pfs/data /hammer/data null rw 561.Ed 562.Pp 563Add to 564.Pa /etc/exports 565(see 566.Xr exports 5 ) : 567.Bd -literal -offset indent 568/hammer/non-pfs 569/hammer/data 570.Ed 571.Sh DIAGNOSTICS 572.Bl -diag 573.It "hammer: System has insuffient buffers to rebalance the tree. nbuf < %d" 574Rebalancing a 575.Nm 576PFS uses quite a bit of memory and 577can't be done on low memory systems. 578It has been reported to fail on 512MB systems. 579Rebalancing isn't critical for 580.Nm 581file system operation; 582it is done by 583.Nm hammer 584.Cm rebalance , 585often as part of 586.Nm hammer 587.Cm cleanup . 588.El 589.Sh SEE ALSO 590.Xr chflags 1 , 591.Xr md5 1 , 592.Xr tar 1 , 593.Xr undo 1 , 594.Xr exports 5 , 595.Xr ffs 5 , 596.Xr fstab 5 , 597.Xr disklabel64 8 , 598.Xr gpt 8 , 599.Xr hammer 8 , 600.Xr mount_hammer 8 , 601.Xr mount_null 8 , 602.Xr newfs_hammer 8 , 603.Xr periodic 8 , 604.Xr sysctl 8 605.Rs 606.%A Matthew Dillon 607.%D June 2008 608.%O http://www.dragonflybsd.org/hammer/hammer.pdf 609.%T "The HAMMER Filesystem" 610.Re 611.Rs 612.%A Matthew Dillon 613.%D October 2008 614.%O http://www.dragonflybsd.org/hammer/nycbsdcon/ 615.%T "Slideshow from NYCBSDCon 2008" 616.Re 617.Rs 618.%A Michael Neumann 619.%D January 2010 620.%O http://www.ntecs.de/sysarch09/HAMMER.pdf 621.%T "Slideshow for a presentation held at KIT (http://www.kit.edu)" 622.Re 623.Sh FILESYSTEM PERFORMANCE 624The 625.Nm 626file system has a front-end which processes VNOPS and issues necessary 627block reads from disk, and a back-end which handles meta-data updates 628on-media and performs all meta-data write operations. 629Bulk file write operations are handled by the front-end. 630Because 631.Nm 632defers meta-data updates virtually no meta-data read operations will be 633issued by the frontend while writing large amounts of data to the file system 634or even when creating new files or directories, and even though the 635kernel prioritizes reads over writes the fact that writes are cached by 636the drive itself tends to lead to excessive priority given to writes. 637.Pp 638There are four bioq sysctls, shown below with default values, 639which can be adjusted to give reads a higher priority: 640.Bd -literal -offset indent 641kern.bioq_reorder_minor_bytes: 262144 642kern.bioq_reorder_burst_bytes: 3000000 643kern.bioq_reorder_minor_interval: 5 644kern.bioq_reorder_burst_interval: 60 645.Ed 646.Pp 647If a higher read priority is desired it is recommended that the 648.Va kern.bioq_reorder_minor_interval 649be increased to 15, 30, or even 60, and the 650.Va kern.bioq_reorder_burst_bytes 651be decreased to 262144 or 524288. 652.Sh HISTORY 653The 654.Nm 655file system first appeared in 656.Dx 1.11 . 657.Sh AUTHORS 658.An -nosplit 659The 660.Nm 661file system was designed and implemented by 662.An Matthew Dillon Aq dillon@backplane.com , 663data deduplication was added by 664.An Ilya Dryomov . 665This manual page was written by 666.An Sascha Wildner 667and updated by 668.An Thomas Nikolajsen . 669.Sh CAVEATS 670Data deduplication is considered experimental. 671