1.\" 2.\" Copyright (c) 2008 3.\" The DragonFly Project. All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in 13.\" the documentation and/or other materials provided with the 14.\" distribution. 15.\" 3. Neither the name of The DragonFly Project nor the names of its 16.\" contributors may be used to endorse or promote products derived 17.\" from this software without specific, prior written permission. 18.\" 19.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 20.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 21.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 22.\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 23.\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 24.\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING, 25.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 26.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 27.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT 29.\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 30.\" SUCH DAMAGE. 31.\" 32.Dd April 19, 2011 33.Dt HAMMER 5 34.Os 35.Sh NAME 36.Nm HAMMER 37.Nd HAMMER file system 38.Sh SYNOPSIS 39To compile this driver into the kernel, 40place the following line in your 41kernel configuration file: 42.Bd -ragged -offset indent 43.Cd "options HAMMER" 44.Ed 45.Pp 46Alternatively, to load the driver as a 47module at boot time, place the following line in 48.Xr loader.conf 5 : 49.Bd -literal -offset indent 50hammer_load="YES" 51.Ed 52.Pp 53To mount via 54.Xr fstab 5 : 55.Bd -literal -offset indent 56/dev/ad0s1d[:/dev/ad1s1d:...] /mnt hammer rw 2 0 57.Ed 58.Sh DESCRIPTION 59The 60.Nm 61file system provides facilities to store file system data onto disk devices 62and is intended to replace 63.Xr ffs 5 64as the default file system for 65.Dx . 66.Pp 67Among its features are instant crash recovery, 68large file systems spanning multiple volumes, 69data integrity checking, 70data deduplication, 71fine grained history retention and snapshots, 72pseudo-filesystems (PFSs), 73mirroring capability and 74unlimited number of files and links. 75.Pp 76All functions related to managing 77.Nm 78file systems are provided by the 79.Xr newfs_hammer 8 , 80.Xr mount_hammer 8 , 81.Xr hammer 8 , 82.Xr sysctl 8 , 83.Xr chflags 1 , 84and 85.Xr undo 1 86utilities. 87.Pp 88For a more detailed introduction refer to the paper and slides listed in the 89.Sx SEE ALSO 90section. 91For some common usages of 92.Nm 93see the 94.Sx EXAMPLES 95section below. 96.Pp 97Description of 98.Nm 99features: 100.Ss Instant Crash Recovery 101After a non-graceful system shutdown, 102.Nm 103file systems will be brought back into a fully coherent state 104when mounting the file system, usually within a few seconds. 105.Pp 106Related commands: 107.Xr mount_hammer 8 108.Ss Large File Systems & Multi Volume 109A 110.Nm 111file system can be up to 1 Exabyte in size. 112It can span up to 256 volumes, 113each volume occupies a 114.Dx 115disk slice or partition, or another special file, 116and can be up to 4096 TB in size. 117Minimum recommended 118.Nm 119file system size is 50 GB. 120For volumes over 2 TB in size 121.Xr gpt 8 122and 123.Xr disklabel64 8 124normally need to be used. 125.Pp 126Related 127.Xr hammer 8 128commands: 129.Cm volume-add , 130.Cm volume-del , 131.Cm volume-list ; 132see also 133.Xr newfs_hammer 8 134.Ss Data Integrity Checking 135.Nm 136has high focus on data integrity, 137CRC checks are made for all major structures and data. 138.Nm 139snapshots implements features to make data integrity checking easier: 140The atime and mtime fields are locked to the ctime 141for files accessed via a snapshot. 142The 143.Fa st_dev 144field is based on the PFS 145.Ar shared-uuid 146and not on any real device. 147This means that archiving the contents of a snapshot with e.g.\& 148.Xr tar 1 149and piping it to something like 150.Xr md5 1 151will yield a consistent result. 152The consistency is also retained on mirroring targets. 153.Ss Data Deduplication 154To save disk space data deduplication can be used. 155Data deduplication will identify data blocks which occur multiple times 156and only store one copy, multiple reference will be made to this copy. 157.Pp 158Related 159.Xr hammer 8 160commands: 161.Cm dedup , 162.Cm dedup-simulate , 163.Cm cleanup , 164.Cm config 165.Ss Transaction IDs 166The 167.Nm 168file system uses 64-bit transaction ids to refer to historical 169file or directory data. 170Transaction ids used by 171.Nm 172are monotonically increasing over time. 173In other words: 174when a transaction is made, 175.Nm 176will always use higher transaction ids for following transactions. 177A transaction id is given in hexadecimal format 178.Li 0x016llx , 179such as 180.Li 0x00000001061a8ba6 . 181.Pp 182Related 183.Xr hammer 8 184commands: 185.Cm snapshot , 186.Cm snap , 187.Cm snaplo , 188.Cm snapq , 189.Cm snapls , 190.Cm synctid 191.Ss History & Snapshots 192History metadata on the media is written with every sync operation, so that 193by default the resolution of a file's history is 30-60 seconds until the next 194prune operation. 195Prior versions of files and directories are generally accessible by appending 196.Ql @@ 197and a transaction id to the name. 198The common way of accessing history, however, is by taking snapshots. 199.Pp 200Snapshots are softlinks to prior versions of directories and their files. 201Their data will be retained across prune operations for as long as the 202softlink exists. 203Removing the softlink enables the file system to reclaim the space 204again upon the next prune & reblock operations. 205In 206.Nm 207Version 3+ snapshots are also maintained as file system meta-data. 208.Pp 209Related 210.Xr hammer 8 211commands: 212.Cm cleanup , 213.Cm history , 214.Cm snapshot , 215.Cm snap , 216.Cm snaplo , 217.Cm snapq , 218.Cm snaprm , 219.Cm snapls , 220.Cm config , 221.Cm viconfig ; 222see also 223.Xr undo 1 224.Ss Pruning & Reblocking 225Pruning is the act of deleting file system history. 226By default only history used by the given snapshots 227and history from after the latest snapshot will be retained. 228By setting the per PFS parameter 229.Cm prune-min , 230history is guaranteed to be saved at least this time interval. 231All other history is deleted. 232Reblocking will reorder all elements and thus defragment the file system and 233free space for reuse. 234After pruning a file system must be reblocked to recover all available space. 235Reblocking is needed even when using the 236.Cm nohistory 237.Xr mount_hammer 8 238option or 239.Xr chflags 1 240flag. 241.Pp 242Related 243.Xr hammer 8 244commands: 245.Cm cleanup , 246.Cm snapshot , 247.Cm prune , 248.Cm prune-everything , 249.Cm rebalance , 250.Cm reblock , 251.Cm reblock-btree , 252.Cm reblock-inodes , 253.Cm reblock-dirs , 254.Cm reblock-data 255.Ss Pseudo-Filesystems (PFSs) 256A pseudo-filesystem, PFS for short, is a sub file system in a 257.Nm 258file system. 259Each PFS has independent inode numbers. 260All disk space in a 261.Nm 262file system is shared between all PFSs in it, 263so each PFS is free to use all remaining space. 264A 265.Nm 266file system supports up to 65536 PFSs. 267The root of a 268.Nm 269file system is PFS# 0, it is called the root PFS and is always a master PFS. 270.Pp 271A PFS can be either master or slave. 272Slaves are always read-only, 273so they can't be updated by normal file operations, only by 274.Xr hammer 8 275operations like mirroring and pruning. 276Upgrading slaves to masters and downgrading masters to slaves are supported. 277.Pp 278It is recommended to use a 279.Nm null 280mount to access a PFS, except for root PFS; 281this way no tools are confused by the PFS root being a symlink 282and inodes not being unique across a 283.Nm 284file system. 285.Pp 286Many 287.Xr hammer 8 288operations operates per PFS, 289this includes mirroring, offline deduping, pruning, reblocking and rebalancing. 290.Pp 291Related 292.Xr hammer 8 293commands: 294.Cm pfs-master , 295.Cm pfs-slave , 296.Cm pfs-status , 297.Cm pfs-update , 298.Cm pfs-destroy , 299.Cm pfs-upgrade , 300.Cm pfs-downgrade ; 301see also 302.Xr mount_null 8 303.Ss Mirroring 304Mirroring is copying of all data in a file system, including snapshots 305and other historical data. 306In order to allow inode numbers to be duplicated on the slaves 307.Nm 308mirroring feature uses PFSs. 309A master or slave PFS can be mirrored to a slave PFS. 310I.e.\& for mirroring multiple slaves per master are supported, 311but multiple masters per slave are not. 312.Pp 313Related 314.Xr hammer 8 315commands: 316.Cm mirror-copy , 317.Cm mirror-stream , 318.Cm mirror-read , 319.Cm mirror-read-stream , 320.Cm mirror-write , 321.Cm mirror-dump 322.Ss Fsync Flush Modes 323The 324.Nm 325file system implements several different 326.Fn fsync 327flush modes, the mode used is set via the 328.Va vfs.hammer.flush_mode 329sysctl, see 330.Xr hammer 8 331for details. 332.Ss Unlimited Number of Files and Links 333There is no limit on the number of files or links in a 334.Nm 335file system, apart from available disk space. 336.Ss NFS Export 337.Nm 338file systems support NFS export. 339NFS export of PFSs is done using 340.Nm null 341mounts (for file/directory in root PFS 342.Nm null 343mount is not needed). 344For example, to export the PFS 345.Pa /hammer/pfs/data , 346create a 347.Nm null 348mount, e.g.\& to 349.Pa /hammer/data 350and export the latter path. 351.Pp 352Don't export a directory containing a PFS (e.g.\& 353.Pa /hammer/pfs 354above). 355Only 356.Nm null 357mount for PFS root 358(e.g.\& 359.Pa /hammer/data 360above) should be exported (subdirectory may be escaped if exported). 361.Ss File System Versions 362As new features have been introduced to 363.Nm 364a version number has been bumped. 365Each 366.Nm 367file system has a version, which can be upgraded to support new features. 368.Pp 369Related 370.Xr hammer 8 371commands: 372.Cm version , 373.Cm version-upgrade ; 374see also 375.Xr newfs_hammer 8 376.Sh EXAMPLES 377.Ss Preparing the File System 378To create and mount a 379.Nm 380file system use the 381.Xr newfs_hammer 8 382and 383.Xr mount_hammer 8 384commands. 385Note that all 386.Nm 387file systems must have a unique name on a per-machine basis. 388.Bd -literal -offset indent 389newfs_hammer -L HOME /dev/ad0s1d 390mount_hammer /dev/ad0s1d /home 391.Ed 392.Pp 393Similarly, multi volume file systems can be created and mounted by 394specifying additional arguments. 395.Bd -literal -offset indent 396newfs_hammer -L MULTIHOME /dev/ad0s1d /dev/ad1s1d 397mount_hammer /dev/ad0s1d /dev/ad1s1d /home 398.Ed 399.Pp 400Once created and mounted, 401.Nm 402file systems need periodic clean up making snapshots, pruning and reblocking, 403in order to have access to history and file system not to fill up. 404For this it is recommended to use the 405.Xr hammer 8 406.Cm cleanup 407metacommand. 408.Pp 409By default, 410.Dx 411is set up to run 412.Nm hammer Cm cleanup 413nightly via 414.Xr periodic 8 . 415.Pp 416It is also possible to perform these operations individually via 417.Xr crontab 5 . 418For example, to reblock the 419.Pa /home 420file system every night at 2:15 for up to 5 minutes: 421.Bd -literal -offset indent 42215 2 * * * hammer -c /var/run/HOME.reblock -t 300 reblock /home \e 423 >/dev/null 2>&1 424.Ed 425.Ss Snapshots 426The 427.Xr hammer 8 428utility's 429.Cm snapshot 430command provides several ways of taking snapshots. 431They all assume a directory where snapshots are kept. 432.Bd -literal -offset indent 433mkdir /snaps 434hammer snapshot /home /snaps/snap1 435(...after some changes in /home...) 436hammer snapshot /home /snaps/snap2 437.Ed 438.Pp 439The softlinks in 440.Pa /snaps 441point to the state of the 442.Pa /home 443directory at the time each snapshot was taken, and could now be used to copy 444the data somewhere else for backup purposes. 445.Pp 446By default, 447.Dx 448is set up to create nightly snapshots of all 449.Nm 450file systems via 451.Xr periodic 8 452and to keep them for 60 days. 453.Ss Pruning 454A snapshot directory is also the argument to the 455.Xr hammer 8 456.Cm prune 457command which frees historical data from the file system that is not 458pointed to by any snapshot link and is not from after the latest snapshot 459and is older than 460.Cm prune-min . 461.Bd -literal -offset indent 462rm /snaps/snap1 463hammer prune /snaps 464.Ed 465.Ss Mirroring 466Mirroring is set up using 467.Nm 468pseudo-filesystems (PFSs). 469To associate the slave with the master its shared UUID should be set to 470the master's shared UUID as output by the 471.Nm hammer Cm pfs-master 472command. 473.Bd -literal -offset indent 474hammer pfs-master /home/pfs/master 475hammer pfs-slave /home/pfs/slave shared-uuid=<master's shared uuid> 476.Ed 477.Pp 478The 479.Pa /home/pfs/slave 480link is unusable for as long as no mirroring operation has taken place. 481.Pp 482To mirror the master's data, either pipe a 483.Cm mirror-read 484command into a 485.Cm mirror-write 486or, as a short-cut, use the 487.Cm mirror-copy 488command (which works across a 489.Xr ssh 1 490connection as well). 491Initial mirroring operation has to be done to the PFS path (as 492.Xr mount_null 8 493can't access it yet). 494.Bd -literal -offset indent 495hammer mirror-copy /home/pfs/master /home/pfs/slave 496.Ed 497.Pp 498It is also possible to have the target PFS auto created 499by just issuing the same 500.Cm mirror-copy 501command, if the target PFS doesn't exist you will be prompted 502if you would like to create it. 503You can even omit the prompting by using the 504.Fl y 505flag: 506.Bd -literal -offset indent 507hammer -y mirror-copy /home/pfs/master /home/pfs/slave 508.Ed 509.Pp 510After this initial step 511.Nm null 512mount can be setup for 513.Pa /home/pfs/slave . 514Further operations can use 515.Nm null 516mounts. 517.Bd -literal -offset indent 518mount_null /home/pfs/master /home/master 519mount_null /home/pfs/slave /home/slave 520 521hammer mirror-copy /home/master /home/slave 522.Ed 523.Ss NFS Export 524To NFS export from the 525.Nm 526file system 527.Pa /hammer 528the directory 529.Pa /hammer/non-pfs 530without PFSs, and the PFS 531.Pa /hammer/pfs/data , 532the latter is 533.Nm null 534mounted to 535.Pa /hammer/data . 536.Pp 537Add to 538.Pa /etc/fstab 539(see 540.Xr fstab 5 ) : 541.Bd -literal -offset indent 542/hammer/pfs/data /hammer/data null rw 543.Ed 544.Pp 545Add to 546.Pa /etc/exports 547(see 548.Xr exports 5 ) : 549.Bd -literal -offset indent 550/hammer/non-pfs 551/hammer/data 552.Ed 553.Sh DIAGNOSTICS 554.Bl -diag 555.It "hammer: System has insuffient buffers to rebalance the tree. nbuf < %d" 556Rebalancing a 557.Nm 558PFS uses quite a bit of memory and 559can't be done on low memory systems. 560It has been reported to fail on 512MB systems. 561Rebalancing isn't critical for 562.Nm 563file system operation; 564it is done by 565.Nm hammer 566.Cm rebalance , 567often as part of 568.Nm hammer 569.Cm cleanup . 570.El 571.Sh SEE ALSO 572.Xr chflags 1 , 573.Xr md5 1 , 574.Xr tar 1 , 575.Xr undo 1 , 576.Xr exports 5 , 577.Xr ffs 5 , 578.Xr fstab 5 , 579.Xr disklabel64 8 , 580.Xr gpt 8 , 581.Xr hammer 8 , 582.Xr mount_hammer 8 , 583.Xr mount_null 8 , 584.Xr newfs_hammer 8 , 585.Xr periodic 8 , 586.Xr sysctl 8 587.Rs 588.%A Matthew Dillon 589.%D June 2008 590.%O http://www.dragonflybsd.org/hammer/hammer.pdf 591.%T "The HAMMER Filesystem" 592.Re 593.Rs 594.%A Matthew Dillon 595.%D October 2008 596.%O http://www.dragonflybsd.org/hammer/nycbsdcon/ 597.%T "Slideshow from NYCBSDCon 2008" 598.Re 599.Rs 600.%A Michael Neumann 601.%D January 2010 602.%O http://www.ntecs.de/sysarch09/HAMMER.pdf 603.%T "Slideshow for a presentation held at KIT (http://www.kit.edu)" 604.Re 605.Sh FILESYSTEM PERFORMANCE 606The 607.Nm 608file system has a front-end which processes VNOPS and issues necessary 609block reads from disk, and a back-end which handles meta-data updates 610on-media and performs all meta-data write operations. 611Bulk file write operations are handled by the front-end. 612Because 613.Nm 614defers meta-data updates virtually no meta-data read operations will be 615issued by the frontend while writing large amounts of data to the file system 616or even when creating new files or directories, and even though the 617kernel prioritizes reads over writes the fact that writes are cached by 618the drive itself tends to lead to excessive priority given to writes. 619.Pp 620There are four bioq sysctls, shown below with default values, 621which can be adjusted to give reads a higher priority: 622.Bd -literal -offset indent 623kern.bioq_reorder_minor_bytes: 262144 624kern.bioq_reorder_burst_bytes: 3000000 625kern.bioq_reorder_minor_interval: 5 626kern.bioq_reorder_burst_interval: 60 627.Ed 628.Pp 629If a higher read priority is desired it is recommended that the 630.Va kern.bioq_reorder_minor_interval 631be increased to 15, 30, or even 60, and the 632.Va kern.bioq_reorder_burst_bytes 633be decreased to 262144 or 524288. 634.Sh HISTORY 635The 636.Nm 637file system first appeared in 638.Dx 1.11 . 639.Sh AUTHORS 640.An -nosplit 641The 642.Nm 643file system was designed and implemented by 644.An Matthew Dillon Aq dillon@backplane.com , 645data deduplication was added by 646.An Ilya Dryomov . 647This manual page was written by 648.An Sascha Wildner 649and updated by 650.An Thomas Nikolajsen . 651.Sh CAVEATS 652Data deduplication is considered experimental. 653