1.\" 2.\" Copyright (c) 2008 3.\" The DragonFly Project. All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in 13.\" the documentation and/or other materials provided with the 14.\" distribution. 15.\" 3. Neither the name of The DragonFly Project nor the names of its 16.\" contributors may be used to endorse or promote products derived 17.\" from this software without specific, prior written permission. 18.\" 19.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 22.\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 23.\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 24.\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING, 25.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 26.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 27.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT 29.\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 30.\" SUCH DAMAGE. 31.\" 32.Dd July 7, 2017 33.Dt HAMMER 5 34.Os 35.Sh NAME 36.Nm HAMMER 37.Nd HAMMER file system 38.Sh SYNOPSIS 39To compile this driver into the kernel, 40place the following line in your 41kernel configuration file: 42.Bd -ragged -offset indent 43.Cd "options HAMMER" 44.Ed 45.Pp 46Alternatively, to load the driver as a 47module at boot time, place the following line in 48.Xr loader.conf 5 : 49.Bd -literal -offset indent 50hammer_load="YES" 51.Ed 52.Pp 53To mount via 54.Xr fstab 5 : 55.Bd -literal -offset indent 56/dev/ad0s1d[:/dev/ad1s1d:...] /mnt hammer rw 2 0 57.Ed 58.Sh DESCRIPTION 59The 60.Nm 61file system provides facilities to store file system data onto disk devices 62and is intended to replace 63.Xr ffs 5 64as the default file system for 65.Dx . 66.Pp 67Among its features are instant crash recovery, 68large file systems spanning multiple volumes, 69data integrity checking, 70data deduplication, 71fine grained history retention and snapshots, 72pseudo-filesystems (PFSs), 73mirroring capability and 74unlimited number of files and links. 75.Pp 76All functions related to managing 77.Nm 78file systems are provided by the 79.Xr newfs_hammer 8 , 80.Xr mount_hammer 8 , 81.Xr hammer 8 , 82.Xr sysctl 8 , 83.Xr chflags 1 , 84and 85.Xr undo 1 86utilities. 87.Pp 88For a more detailed introduction refer to the paper and slides listed in the 89.Sx SEE ALSO 90section. 91For some common usages of 92.Nm 93see the 94.Sx EXAMPLES 95section below. 96.Pp 97Description of 98.Nm 99features: 100.Ss Instant Crash Recovery 101After a non-graceful system shutdown, 102.Nm 103file systems will be brought back into a fully coherent state 104when mounting the file system, usually within a few seconds. 105.Pp 106In the unlikely case 107.Nm 108mount fails due redo recovery (stage 2 recovery) being corrupted, a 109workaround to skip this stage can be applied by setting the following tunable: 110.Bd -literal -offset indent 111vfs.hammer.skip_redo=<value> 112.Ed 113.Pp 114Possible values are: 115.Bl -tag -width indent 116.It 0 117Run redo recovery normally and fail to mount in the case of error (default). 118.It 1 119Run redo recovery but continue mounting if an error appears. 120.It 2 121Completely bypass redo recovery. 122.El 123.Pp 124Related commands: 125.Xr mount_hammer 8 126.Ss Large File Systems & Multi Volume 127A 128.Nm 129file system can be up to 1 Exabyte in size. 130It can span up to 256 volumes, 131each volume occupies a 132.Dx 133disk slice or partition, or another special file, 134and can be up to 4096 TB in size. 135Minimum recommended 136.Nm 137file system size is 50 GB. 138For volumes over 2 TB in size 139.Xr gpt 8 140and 141.Xr disklabel64 8 142normally need to be used. 143.Pp 144Related 145.Xr hammer 8 146commands: 147.Cm volume-add , 148.Cm volume-del , 149.Cm volume-list , 150.Cm volume-blkdevs ; 151see also 152.Xr newfs_hammer 8 153.Ss Data Integrity Checking 154.Nm 155has high focus on data integrity, 156CRC checks are made for all major structures and data. 157.Nm 158snapshots implements features to make data integrity checking easier: 159The atime and mtime fields are locked to the ctime 160for files accessed via a snapshot. 161The 162.Fa st_dev 163field is based on the PFS 164.Ar shared-uuid 165and not on any real device. 166This means that archiving the contents of a snapshot with e.g.\& 167.Xr tar 1 168and piping it to something like 169.Xr md5 1 170will yield a consistent result. 171The consistency is also retained on mirroring targets. 172.Ss Data Deduplication 173To save disk space data deduplication can be used. 174Data deduplication will identify data blocks which occur multiple times 175and only store one copy, multiple reference will be made to this copy. 176.Pp 177Related 178.Xr hammer 8 179commands: 180.Cm dedup , 181.Cm dedup-simulate , 182.Cm cleanup , 183.Cm config 184.Ss Transaction IDs 185The 186.Nm 187file system uses 64-bit transaction ids to refer to historical 188file or directory data. 189Transaction ids used by 190.Nm 191are monotonically increasing over time. 192In other words: 193when a transaction is made, 194.Nm 195will always use higher transaction ids for following transactions. 196A transaction id is given in hexadecimal format 197.Li 0x016llx , 198such as 199.Li 0x00000001061a8ba6 . 200.Pp 201Related 202.Xr hammer 8 203commands: 204.Cm snapshot , 205.Cm snap , 206.Cm snaplo , 207.Cm snapq , 208.Cm snapls , 209.Cm synctid 210.Ss History & Snapshots 211History metadata on the media is written with every sync operation, so that 212by default the resolution of a file's history is 30-60 seconds until the next 213prune operation. 214Prior versions of files and directories are generally accessible by appending 215.Ql @@ 216and a transaction id to the name. 217The common way of accessing history, however, is by taking snapshots. 218.Pp 219Snapshots are softlinks to prior versions of directories and their files. 220Their data will be retained across prune operations for as long as the 221softlink exists. 222Removing the softlink enables the file system to reclaim the space 223again upon the next prune & reblock operations. 224In 225.Nm 226Version 3+ snapshots are also maintained as file system meta-data. 227.Pp 228Related 229.Xr hammer 8 230commands: 231.Cm cleanup , 232.Cm history , 233.Cm snapshot , 234.Cm snap , 235.Cm snaplo , 236.Cm snapq , 237.Cm snaprm , 238.Cm snapls , 239.Cm config , 240.Cm viconfig ; 241see also 242.Xr undo 1 243.Ss Pruning & Reblocking 244Pruning is the act of deleting file system history. 245By default only history used by the given snapshots 246and history from after the latest snapshot will be retained. 247By setting the per PFS parameter 248.Cm prune-min , 249history is guaranteed to be saved at least this time interval. 250All other history is deleted. 251Reblocking will reorder all elements and thus defragment the file system and 252free space for reuse. 253After pruning a file system must be reblocked to recover all available space. 254Reblocking is needed even when using the 255.Cm nohistory 256.Xr mount_hammer 8 257option or 258.Xr chflags 1 259flag. 260.Pp 261Related 262.Xr hammer 8 263commands: 264.Cm cleanup , 265.Cm snapshot , 266.Cm prune , 267.Cm prune-everything , 268.Cm rebalance , 269.Cm reblock , 270.Cm reblock-btree , 271.Cm reblock-inodes , 272.Cm reblock-dirs , 273.Cm reblock-data 274.Ss Pseudo-Filesystems (PFSs) 275A pseudo-filesystem, PFS for short, is a sub file system in a 276.Nm 277file system. 278All disk space in a 279.Nm 280file system is shared between all PFSs in it, 281so each PFS is free to use all remaining space. 282A 283.Nm 284file system supports up to 65536 PFSs. 285The root of a 286.Nm 287file system is PFS# 0, it is called the root PFS and is always a master PFS. 288.Pp 289A non-root PFS can be either master or slave. 290Slaves are always read-only, 291so they can't be updated by normal file operations, only by 292.Xr hammer 8 293operations like mirroring and pruning. 294Upgrading slaves to masters and downgrading masters to slaves are supported. 295.Pp 296It is recommended to use a 297.Nm null 298mount to access a PFS, except for root PFS; 299this way no tools are confused by the PFS root being a symlink 300and inodes not being unique across a 301.Nm 302file system. 303.Pp 304Many 305.Xr hammer 8 306operations operates per PFS, 307this includes mirroring, offline deduping, pruning, reblocking and rebalancing. 308.Pp 309Related 310.Xr hammer 8 311commands: 312.Cm pfs-master , 313.Cm pfs-slave , 314.Cm pfs-status , 315.Cm pfs-update , 316.Cm pfs-destroy , 317.Cm pfs-upgrade , 318.Cm pfs-downgrade ; 319see also 320.Xr mount_null 8 321.Ss Mirroring 322Mirroring is copying of all data in a file system, including snapshots 323and other historical data. 324In order to allow inode numbers to be duplicated on the slaves 325.Nm 326mirroring feature uses PFSs. 327A master or slave PFS can be mirrored to a slave PFS. 328I.e.\& for mirroring multiple slaves per master are supported, 329but multiple masters per slave are not. 330.Nm 331does not support multi-master clustering and mirroring. 332.Pp 333Related 334.Xr hammer 8 335commands: 336.Cm mirror-copy , 337.Cm mirror-stream , 338.Cm mirror-read , 339.Cm mirror-read-stream , 340.Cm mirror-write , 341.Cm mirror-dump 342.Ss Fsync Flush Modes 343The 344.Nm 345file system implements several different 346.Fn fsync 347flush modes, the mode used is set via the 348.Va vfs.hammer.flush_mode 349sysctl, see 350.Xr hammer 8 351for details. 352.Ss Unlimited Number of Files and Links 353There is no limit on the number of files or links in a 354.Nm 355file system, apart from available disk space. 356.Ss NFS Export 357.Nm 358file systems support NFS export. 359NFS export of PFSs is done using 360.Nm null 361mounts (for file/directory in root PFS 362.Nm null 363mount is not needed). 364For example, to export the PFS 365.Pa /hammer/pfs/data , 366create a 367.Nm null 368mount, e.g.\& to 369.Pa /hammer/data 370and export the latter path. 371.Pp 372Don't export a directory containing a PFS (e.g.\& 373.Pa /hammer/pfs 374above). 375Only 376.Nm null 377mount for PFS root 378(e.g.\& 379.Pa /hammer/data 380above) should be exported (subdirectory may be escaped if exported). 381.Ss File System Versions 382As new features have been introduced to 383.Nm 384a version number has been bumped. 385Each 386.Nm 387file system has a version, which can be upgraded to support new features. 388.Pp 389Related 390.Xr hammer 8 391commands: 392.Cm version , 393.Cm version-upgrade ; 394see also 395.Xr newfs_hammer 8 396.Sh EXAMPLES 397.Ss Preparing the File System 398To create and mount a 399.Nm 400file system use the 401.Xr newfs_hammer 8 402and 403.Xr mount_hammer 8 404commands. 405Note that all 406.Nm 407file systems must have a unique name on a per-machine basis. 408.Bd -literal -offset indent 409newfs_hammer -L HOME /dev/ad0s1d 410mount_hammer /dev/ad0s1d /home 411.Ed 412.Pp 413Similarly, multi volume file systems can be created and mounted by 414specifying additional arguments. 415.Bd -literal -offset indent 416newfs_hammer -L MULTIHOME /dev/ad0s1d /dev/ad1s1d 417mount_hammer /dev/ad0s1d /dev/ad1s1d /home 418.Ed 419.Pp 420Once created and mounted, 421.Nm 422file systems need periodic clean up making snapshots, pruning and reblocking, 423in order to have access to history and file system not to fill up. 424For this it is recommended to use the 425.Xr hammer 8 426.Cm cleanup 427metacommand. 428.Pp 429By default, 430.Dx 431is set up to run 432.Nm hammer Cm cleanup 433nightly via 434.Xr periodic 8 . 435.Pp 436It is also possible to perform these operations individually via 437.Xr crontab 5 . 438For example, to reblock the 439.Pa /home 440file system every night at 2:15 for up to 5 minutes: 441.Bd -literal -offset indent 44215 2 * * * hammer -c /var/run/HOME.reblock -t 300 reblock /home \e 443 >/dev/null 2>&1 444.Ed 445.Ss Snapshots 446The 447.Xr hammer 8 448utility's 449.Cm snapshot 450command provides several ways of taking snapshots. 451They all assume a directory where snapshots are kept. 452.Bd -literal -offset indent 453mkdir /snaps 454hammer snapshot /home /snaps/snap1 455(...after some changes in /home...) 456hammer snapshot /home /snaps/snap2 457.Ed 458.Pp 459The softlinks in 460.Pa /snaps 461point to the state of the 462.Pa /home 463directory at the time each snapshot was taken, and could now be used to copy 464the data somewhere else for backup purposes. 465.Pp 466By default, 467.Dx 468is set up to create nightly snapshots of all 469.Nm 470file systems via 471.Xr periodic 8 472and to keep them for 60 days. 473.Ss Pruning 474A snapshot directory is also the argument to the 475.Xr hammer 8 476.Cm prune 477command which frees historical data from the file system that is not 478pointed to by any snapshot link and is not from after the latest snapshot 479and is older than 480.Cm prune-min . 481.Bd -literal -offset indent 482rm /snaps/snap1 483hammer prune /snaps 484.Ed 485.Ss Mirroring 486Mirroring is set up using 487.Nm 488pseudo-filesystems (PFSs). 489To associate the slave with the master its shared UUID should be set to 490the master's shared UUID as output by the 491.Nm hammer Cm pfs-master 492command. 493.Bd -literal -offset indent 494hammer pfs-master /home/pfs/master 495hammer pfs-slave /home/pfs/slave shared-uuid=<master's shared uuid> 496.Ed 497.Pp 498The 499.Pa /home/pfs/slave 500link is unusable for as long as no mirroring operation has taken place. 501.Pp 502To mirror the master's data, either pipe a 503.Cm mirror-read 504command into a 505.Cm mirror-write 506or, as a short-cut, use the 507.Cm mirror-copy 508command (which works across a 509.Xr ssh 1 510connection as well). 511Initial mirroring operation has to be done to the PFS path (as 512.Xr mount_null 8 513can't access it yet). 514.Bd -literal -offset indent 515hammer mirror-copy /home/pfs/master /home/pfs/slave 516.Ed 517.Pp 518It is also possible to have the target PFS auto created 519by just issuing the same 520.Cm mirror-copy 521command, if the target PFS doesn't exist you will be prompted 522if you would like to create it. 523You can even omit the prompting by using the 524.Fl y 525flag: 526.Bd -literal -offset indent 527hammer -y mirror-copy /home/pfs/master /home/pfs/slave 528.Ed 529.Pp 530After this initial step 531.Nm null 532mount can be setup for 533.Pa /home/pfs/slave . 534Further operations can use 535.Nm null 536mounts. 537.Bd -literal -offset indent 538mount_null /home/pfs/master /home/master 539mount_null /home/pfs/slave /home/slave 540 541hammer mirror-copy /home/master /home/slave 542.Ed 543.Ss NFS Export 544To NFS export from the 545.Nm 546file system 547.Pa /hammer 548the directory 549.Pa /hammer/non-pfs 550without PFSs, and the PFS 551.Pa /hammer/pfs/data , 552the latter is 553.Nm null 554mounted to 555.Pa /hammer/data . 556.Pp 557Add to 558.Pa /etc/fstab 559(see 560.Xr fstab 5 ) : 561.Bd -literal -offset indent 562/hammer/pfs/data /hammer/data null rw 563.Ed 564.Pp 565Add to 566.Pa /etc/exports 567(see 568.Xr exports 5 ) : 569.Bd -literal -offset indent 570/hammer/non-pfs 571/hammer/data 572.Ed 573.Sh DIAGNOSTICS 574.Bl -diag 575.It "hammer: System has insuffient buffers to rebalance the tree. nbuf < %d" 576Rebalancing a 577.Nm 578PFS uses quite a bit of memory and 579can't be done on low memory systems. 580It has been reported to fail on 512MB systems. 581Rebalancing isn't critical for 582.Nm 583file system operation; 584it is done by 585.Nm hammer 586.Cm rebalance , 587often as part of 588.Nm hammer 589.Cm cleanup . 590.El 591.Sh SEE ALSO 592.Xr chflags 1 , 593.Xr md5 1 , 594.Xr tar 1 , 595.Xr undo 1 , 596.Xr exports 5 , 597.Xr ffs 5 , 598.Xr fstab 5 , 599.Xr disklabel64 8 , 600.Xr gpt 8 , 601.Xr hammer 8 , 602.Xr mount_hammer 8 , 603.Xr mount_null 8 , 604.Xr newfs_hammer 8 , 605.Xr periodic 8 , 606.Xr sysctl 8 607.Rs 608.%A Matthew Dillon 609.%D June 2008 610.%O http://www.dragonflybsd.org/hammer/hammer.pdf 611.%T "The HAMMER Filesystem" 612.Re 613.Rs 614.%A Matthew Dillon 615.%D October 2008 616.%O http://www.dragonflybsd.org/presentations/nycbsdcon08/ 617.%T "Slideshow from NYCBSDCon 2008" 618.Re 619.Rs 620.%A Michael Neumann 621.%D January 2010 622.%O http://www.ntecs.de/talks/HAMMER.pdf 623.%T "Slideshow for a presentation held at KIT (http://www.kit.edu)" 624.Re 625.Sh FILESYSTEM PERFORMANCE 626The 627.Nm 628file system has a front-end which processes VNOPS and issues necessary 629block reads from disk, and a back-end which handles meta-data updates 630on-media and performs all meta-data write operations. 631Bulk file write operations are handled by the front-end. 632Because 633.Nm 634defers meta-data updates virtually no meta-data read operations will be 635issued by the frontend while writing large amounts of data to the file system 636or even when creating new files or directories, and even though the 637kernel prioritizes reads over writes the fact that writes are cached by 638the drive itself tends to lead to excessive priority given to writes. 639.Pp 640There are four bioq sysctls, shown below with default values, 641which can be adjusted to give reads a higher priority: 642.Bd -literal -offset indent 643kern.bioq_reorder_minor_bytes: 262144 644kern.bioq_reorder_burst_bytes: 3000000 645kern.bioq_reorder_minor_interval: 5 646kern.bioq_reorder_burst_interval: 60 647.Ed 648.Pp 649If a higher read priority is desired it is recommended that the 650.Va kern.bioq_reorder_minor_interval 651be increased to 15, 30, or even 60, and the 652.Va kern.bioq_reorder_burst_bytes 653be decreased to 262144 or 524288. 654.Sh HISTORY 655The 656.Nm 657file system first appeared in 658.Dx 1.11 . 659.Sh AUTHORS 660.An -nosplit 661The 662.Nm 663file system was designed and implemented by 664.An Matthew Dillon Aq Mt dillon@backplane.com , 665data deduplication was added by 666.An Ilya Dryomov . 667This manual page was written by 668.An Sascha Wildner 669and updated by 670.An Thomas Nikolajsen . 671