1.\" 2.\" Copyright (c) 2008 3.\" The DragonFly Project. All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in 13.\" the documentation and/or other materials provided with the 14.\" distribution. 15.\" 3. Neither the name of The DragonFly Project nor the names of its 16.\" contributors may be used to endorse or promote products derived 17.\" from this software without specific, prior written permission. 18.\" 19.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 22.\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 23.\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 24.\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING, 25.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 26.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 27.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT 29.\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 30.\" SUCH DAMAGE. 31.\" 32.Dd September 21, 2015 33.Dt HAMMER 5 34.Os 35.Sh NAME 36.Nm HAMMER 37.Nd HAMMER file system 38.Sh SYNOPSIS 39To compile this driver into the kernel, 40place the following line in your 41kernel configuration file: 42.Bd -ragged -offset indent 43.Cd "options HAMMER" 44.Ed 45.Pp 46Alternatively, to load the driver as a 47module at boot time, place the following line in 48.Xr loader.conf 5 : 49.Bd -literal -offset indent 50hammer_load="YES" 51.Ed 52.Pp 53To mount via 54.Xr fstab 5 : 55.Bd -literal -offset indent 56/dev/ad0s1d[:/dev/ad1s1d:...] /mnt hammer rw 2 0 57.Ed 58.Sh DESCRIPTION 59The 60.Nm 61file system provides facilities to store file system data onto disk devices 62and is intended to replace 63.Xr ffs 5 64as the default file system for 65.Dx . 66.Pp 67Among its features are instant crash recovery, 68large file systems spanning multiple volumes, 69data integrity checking, 70data deduplication, 71fine grained history retention and snapshots, 72pseudo-filesystems (PFSs), 73mirroring capability and 74unlimited number of files and links. 75.Pp 76All functions related to managing 77.Nm 78file systems are provided by the 79.Xr newfs_hammer 8 , 80.Xr mount_hammer 8 , 81.Xr hammer 8 , 82.Xr sysctl 8 , 83.Xr chflags 1 , 84and 85.Xr undo 1 86utilities. 87.Pp 88For a more detailed introduction refer to the paper and slides listed in the 89.Sx SEE ALSO 90section. 91For some common usages of 92.Nm 93see the 94.Sx EXAMPLES 95section below. 96.Pp 97Description of 98.Nm 99features: 100.Ss Instant Crash Recovery 101After a non-graceful system shutdown, 102.Nm 103file systems will be brought back into a fully coherent state 104when mounting the file system, usually within a few seconds. 105.Pp 106In the unlikely case 107.Nm 108mount fails due redo recovery (stage 2 recovery) being corrupted, a 109workaround to skip this stage can be applied by setting the following tunable: 110.Bd -literal -offset indent 111vfs.hammer.skip_redo=<value> 112.Ed 113.Pp 114Possible values are: 115.Bl -tag -width indent 116.It 0 117Run redo recovery normally and fail to mount in the case of error (default). 118.It 1 119Run redo recovery but continue mounting if an error appears. 120.It 2 121Completely bypass redo recovery. 122.El 123.Pp 124Related commands: 125.Xr mount_hammer 8 126.Ss Large File Systems & Multi Volume 127A 128.Nm 129file system can be up to 1 Exabyte in size. 130It can span up to 256 volumes, 131each volume occupies a 132.Dx 133disk slice or partition, or another special file, 134and can be up to 4096 TB in size. 135Minimum recommended 136.Nm 137file system size is 50 GB. 138For volumes over 2 TB in size 139.Xr gpt 8 140and 141.Xr disklabel64 8 142normally need to be used. 143.Pp 144Related 145.Xr hammer 8 146commands: 147.Cm volume-add , 148.Cm volume-del , 149.Cm volume-list , 150.Cm volume-blkdevs ; 151see also 152.Xr newfs_hammer 8 153.Ss Data Integrity Checking 154.Nm 155has high focus on data integrity, 156CRC checks are made for all major structures and data. 157.Nm 158snapshots implements features to make data integrity checking easier: 159The atime and mtime fields are locked to the ctime 160for files accessed via a snapshot. 161The 162.Fa st_dev 163field is based on the PFS 164.Ar shared-uuid 165and not on any real device. 166This means that archiving the contents of a snapshot with e.g.\& 167.Xr tar 1 168and piping it to something like 169.Xr md5 1 170will yield a consistent result. 171The consistency is also retained on mirroring targets. 172.Ss Data Deduplication 173To save disk space data deduplication can be used. 174Data deduplication will identify data blocks which occur multiple times 175and only store one copy, multiple reference will be made to this copy. 176.Pp 177Related 178.Xr hammer 8 179commands: 180.Cm dedup , 181.Cm dedup-simulate , 182.Cm cleanup , 183.Cm config 184.Ss Transaction IDs 185The 186.Nm 187file system uses 64-bit transaction ids to refer to historical 188file or directory data. 189Transaction ids used by 190.Nm 191are monotonically increasing over time. 192In other words: 193when a transaction is made, 194.Nm 195will always use higher transaction ids for following transactions. 196A transaction id is given in hexadecimal format 197.Li 0x016llx , 198such as 199.Li 0x00000001061a8ba6 . 200.Pp 201Related 202.Xr hammer 8 203commands: 204.Cm snapshot , 205.Cm snap , 206.Cm snaplo , 207.Cm snapq , 208.Cm snapls , 209.Cm synctid 210.Ss History & Snapshots 211History metadata on the media is written with every sync operation, so that 212by default the resolution of a file's history is 30-60 seconds until the next 213prune operation. 214Prior versions of files and directories are generally accessible by appending 215.Ql @@ 216and a transaction id to the name. 217The common way of accessing history, however, is by taking snapshots. 218.Pp 219Snapshots are softlinks to prior versions of directories and their files. 220Their data will be retained across prune operations for as long as the 221softlink exists. 222Removing the softlink enables the file system to reclaim the space 223again upon the next prune & reblock operations. 224In 225.Nm 226Version 3+ snapshots are also maintained as file system meta-data. 227.Pp 228Related 229.Xr hammer 8 230commands: 231.Cm cleanup , 232.Cm history , 233.Cm snapshot , 234.Cm snap , 235.Cm snaplo , 236.Cm snapq , 237.Cm snaprm , 238.Cm snapls , 239.Cm config , 240.Cm viconfig ; 241see also 242.Xr undo 1 243.Ss Pruning & Reblocking 244Pruning is the act of deleting file system history. 245By default only history used by the given snapshots 246and history from after the latest snapshot will be retained. 247By setting the per PFS parameter 248.Cm prune-min , 249history is guaranteed to be saved at least this time interval. 250All other history is deleted. 251Reblocking will reorder all elements and thus defragment the file system and 252free space for reuse. 253After pruning a file system must be reblocked to recover all available space. 254Reblocking is needed even when using the 255.Cm nohistory 256.Xr mount_hammer 8 257option or 258.Xr chflags 1 259flag. 260.Pp 261Related 262.Xr hammer 8 263commands: 264.Cm cleanup , 265.Cm snapshot , 266.Cm prune , 267.Cm prune-everything , 268.Cm rebalance , 269.Cm reblock , 270.Cm reblock-btree , 271.Cm reblock-inodes , 272.Cm reblock-dirs , 273.Cm reblock-data 274.Ss Pseudo-Filesystems (PFSs) 275A pseudo-filesystem, PFS for short, is a sub file system in a 276.Nm 277file system. 278Each PFS has independent inode numbers. 279All disk space in a 280.Nm 281file system is shared between all PFSs in it, 282so each PFS is free to use all remaining space. 283A 284.Nm 285file system supports up to 65536 PFSs. 286The root of a 287.Nm 288file system is PFS# 0, it is called the root PFS and is always a master PFS. 289.Pp 290A PFS can be either master or slave. 291Slaves are always read-only, 292so they can't be updated by normal file operations, only by 293.Xr hammer 8 294operations like mirroring and pruning. 295Upgrading slaves to masters and downgrading masters to slaves are supported. 296.Pp 297It is recommended to use a 298.Nm null 299mount to access a PFS, except for root PFS; 300this way no tools are confused by the PFS root being a symlink 301and inodes not being unique across a 302.Nm 303file system. 304.Pp 305Many 306.Xr hammer 8 307operations operates per PFS, 308this includes mirroring, offline deduping, pruning, reblocking and rebalancing. 309.Pp 310Related 311.Xr hammer 8 312commands: 313.Cm pfs-master , 314.Cm pfs-slave , 315.Cm pfs-status , 316.Cm pfs-update , 317.Cm pfs-destroy , 318.Cm pfs-upgrade , 319.Cm pfs-downgrade ; 320see also 321.Xr mount_null 8 322.Ss Mirroring 323Mirroring is copying of all data in a file system, including snapshots 324and other historical data. 325In order to allow inode numbers to be duplicated on the slaves 326.Nm 327mirroring feature uses PFSs. 328A master or slave PFS can be mirrored to a slave PFS. 329I.e.\& for mirroring multiple slaves per master are supported, 330but multiple masters per slave are not. 331.Nm 332does not support multi-master clustering and mirroring. 333.Pp 334Related 335.Xr hammer 8 336commands: 337.Cm mirror-copy , 338.Cm mirror-stream , 339.Cm mirror-read , 340.Cm mirror-read-stream , 341.Cm mirror-write , 342.Cm mirror-dump 343.Ss Fsync Flush Modes 344The 345.Nm 346file system implements several different 347.Fn fsync 348flush modes, the mode used is set via the 349.Va vfs.hammer.flush_mode 350sysctl, see 351.Xr hammer 8 352for details. 353.Ss Unlimited Number of Files and Links 354There is no limit on the number of files or links in a 355.Nm 356file system, apart from available disk space. 357.Ss NFS Export 358.Nm 359file systems support NFS export. 360NFS export of PFSs is done using 361.Nm null 362mounts (for file/directory in root PFS 363.Nm null 364mount is not needed). 365For example, to export the PFS 366.Pa /hammer/pfs/data , 367create a 368.Nm null 369mount, e.g.\& to 370.Pa /hammer/data 371and export the latter path. 372.Pp 373Don't export a directory containing a PFS (e.g.\& 374.Pa /hammer/pfs 375above). 376Only 377.Nm null 378mount for PFS root 379(e.g.\& 380.Pa /hammer/data 381above) should be exported (subdirectory may be escaped if exported). 382.Ss File System Versions 383As new features have been introduced to 384.Nm 385a version number has been bumped. 386Each 387.Nm 388file system has a version, which can be upgraded to support new features. 389.Pp 390Related 391.Xr hammer 8 392commands: 393.Cm version , 394.Cm version-upgrade ; 395see also 396.Xr newfs_hammer 8 397.Sh EXAMPLES 398.Ss Preparing the File System 399To create and mount a 400.Nm 401file system use the 402.Xr newfs_hammer 8 403and 404.Xr mount_hammer 8 405commands. 406Note that all 407.Nm 408file systems must have a unique name on a per-machine basis. 409.Bd -literal -offset indent 410newfs_hammer -L HOME /dev/ad0s1d 411mount_hammer /dev/ad0s1d /home 412.Ed 413.Pp 414Similarly, multi volume file systems can be created and mounted by 415specifying additional arguments. 416.Bd -literal -offset indent 417newfs_hammer -L MULTIHOME /dev/ad0s1d /dev/ad1s1d 418mount_hammer /dev/ad0s1d /dev/ad1s1d /home 419.Ed 420.Pp 421Once created and mounted, 422.Nm 423file systems need periodic clean up making snapshots, pruning and reblocking, 424in order to have access to history and file system not to fill up. 425For this it is recommended to use the 426.Xr hammer 8 427.Cm cleanup 428metacommand. 429.Pp 430By default, 431.Dx 432is set up to run 433.Nm hammer Cm cleanup 434nightly via 435.Xr periodic 8 . 436.Pp 437It is also possible to perform these operations individually via 438.Xr crontab 5 . 439For example, to reblock the 440.Pa /home 441file system every night at 2:15 for up to 5 minutes: 442.Bd -literal -offset indent 44315 2 * * * hammer -c /var/run/HOME.reblock -t 300 reblock /home \e 444 >/dev/null 2>&1 445.Ed 446.Ss Snapshots 447The 448.Xr hammer 8 449utility's 450.Cm snapshot 451command provides several ways of taking snapshots. 452They all assume a directory where snapshots are kept. 453.Bd -literal -offset indent 454mkdir /snaps 455hammer snapshot /home /snaps/snap1 456(...after some changes in /home...) 457hammer snapshot /home /snaps/snap2 458.Ed 459.Pp 460The softlinks in 461.Pa /snaps 462point to the state of the 463.Pa /home 464directory at the time each snapshot was taken, and could now be used to copy 465the data somewhere else for backup purposes. 466.Pp 467By default, 468.Dx 469is set up to create nightly snapshots of all 470.Nm 471file systems via 472.Xr periodic 8 473and to keep them for 60 days. 474.Ss Pruning 475A snapshot directory is also the argument to the 476.Xr hammer 8 477.Cm prune 478command which frees historical data from the file system that is not 479pointed to by any snapshot link and is not from after the latest snapshot 480and is older than 481.Cm prune-min . 482.Bd -literal -offset indent 483rm /snaps/snap1 484hammer prune /snaps 485.Ed 486.Ss Mirroring 487Mirroring is set up using 488.Nm 489pseudo-filesystems (PFSs). 490To associate the slave with the master its shared UUID should be set to 491the master's shared UUID as output by the 492.Nm hammer Cm pfs-master 493command. 494.Bd -literal -offset indent 495hammer pfs-master /home/pfs/master 496hammer pfs-slave /home/pfs/slave shared-uuid=<master's shared uuid> 497.Ed 498.Pp 499The 500.Pa /home/pfs/slave 501link is unusable for as long as no mirroring operation has taken place. 502.Pp 503To mirror the master's data, either pipe a 504.Cm mirror-read 505command into a 506.Cm mirror-write 507or, as a short-cut, use the 508.Cm mirror-copy 509command (which works across a 510.Xr ssh 1 511connection as well). 512Initial mirroring operation has to be done to the PFS path (as 513.Xr mount_null 8 514can't access it yet). 515.Bd -literal -offset indent 516hammer mirror-copy /home/pfs/master /home/pfs/slave 517.Ed 518.Pp 519It is also possible to have the target PFS auto created 520by just issuing the same 521.Cm mirror-copy 522command, if the target PFS doesn't exist you will be prompted 523if you would like to create it. 524You can even omit the prompting by using the 525.Fl y 526flag: 527.Bd -literal -offset indent 528hammer -y mirror-copy /home/pfs/master /home/pfs/slave 529.Ed 530.Pp 531After this initial step 532.Nm null 533mount can be setup for 534.Pa /home/pfs/slave . 535Further operations can use 536.Nm null 537mounts. 538.Bd -literal -offset indent 539mount_null /home/pfs/master /home/master 540mount_null /home/pfs/slave /home/slave 541 542hammer mirror-copy /home/master /home/slave 543.Ed 544.Ss NFS Export 545To NFS export from the 546.Nm 547file system 548.Pa /hammer 549the directory 550.Pa /hammer/non-pfs 551without PFSs, and the PFS 552.Pa /hammer/pfs/data , 553the latter is 554.Nm null 555mounted to 556.Pa /hammer/data . 557.Pp 558Add to 559.Pa /etc/fstab 560(see 561.Xr fstab 5 ) : 562.Bd -literal -offset indent 563/hammer/pfs/data /hammer/data null rw 564.Ed 565.Pp 566Add to 567.Pa /etc/exports 568(see 569.Xr exports 5 ) : 570.Bd -literal -offset indent 571/hammer/non-pfs 572/hammer/data 573.Ed 574.Sh DIAGNOSTICS 575.Bl -diag 576.It "hammer: System has insuffient buffers to rebalance the tree. nbuf < %d" 577Rebalancing a 578.Nm 579PFS uses quite a bit of memory and 580can't be done on low memory systems. 581It has been reported to fail on 512MB systems. 582Rebalancing isn't critical for 583.Nm 584file system operation; 585it is done by 586.Nm hammer 587.Cm rebalance , 588often as part of 589.Nm hammer 590.Cm cleanup . 591.El 592.Sh SEE ALSO 593.Xr chflags 1 , 594.Xr md5 1 , 595.Xr tar 1 , 596.Xr undo 1 , 597.Xr exports 5 , 598.Xr ffs 5 , 599.Xr fstab 5 , 600.Xr disklabel64 8 , 601.Xr gpt 8 , 602.Xr hammer 8 , 603.Xr mount_hammer 8 , 604.Xr mount_null 8 , 605.Xr newfs_hammer 8 , 606.Xr periodic 8 , 607.Xr sysctl 8 608.Rs 609.%A Matthew Dillon 610.%D June 2008 611.%O http://www.dragonflybsd.org/hammer/hammer.pdf 612.%T "The HAMMER Filesystem" 613.Re 614.Rs 615.%A Matthew Dillon 616.%D October 2008 617.%O http://www.dragonflybsd.org/presentations/nycbsdcon08/ 618.%T "Slideshow from NYCBSDCon 2008" 619.Re 620.Rs 621.%A Michael Neumann 622.%D January 2010 623.%O http://www.ntecs.de/talks/HAMMER.pdf 624.%T "Slideshow for a presentation held at KIT (http://www.kit.edu)" 625.Re 626.Sh FILESYSTEM PERFORMANCE 627The 628.Nm 629file system has a front-end which processes VNOPS and issues necessary 630block reads from disk, and a back-end which handles meta-data updates 631on-media and performs all meta-data write operations. 632Bulk file write operations are handled by the front-end. 633Because 634.Nm 635defers meta-data updates virtually no meta-data read operations will be 636issued by the frontend while writing large amounts of data to the file system 637or even when creating new files or directories, and even though the 638kernel prioritizes reads over writes the fact that writes are cached by 639the drive itself tends to lead to excessive priority given to writes. 640.Pp 641There are four bioq sysctls, shown below with default values, 642which can be adjusted to give reads a higher priority: 643.Bd -literal -offset indent 644kern.bioq_reorder_minor_bytes: 262144 645kern.bioq_reorder_burst_bytes: 3000000 646kern.bioq_reorder_minor_interval: 5 647kern.bioq_reorder_burst_interval: 60 648.Ed 649.Pp 650If a higher read priority is desired it is recommended that the 651.Va kern.bioq_reorder_minor_interval 652be increased to 15, 30, or even 60, and the 653.Va kern.bioq_reorder_burst_bytes 654be decreased to 262144 or 524288. 655.Sh HISTORY 656The 657.Nm 658file system first appeared in 659.Dx 1.11 . 660.Sh AUTHORS 661.An -nosplit 662The 663.Nm 664file system was designed and implemented by 665.An Matthew Dillon Aq Mt dillon@backplane.com , 666data deduplication was added by 667.An Ilya Dryomov . 668This manual page was written by 669.An Sascha Wildner 670and updated by 671.An Thomas Nikolajsen . 672