xref: /dragonfly/share/man/man5/hammer.5 (revision fbc9049b)
1.\"
2.\" Copyright (c) 2008
3.\"	The DragonFly Project.  All rights reserved.
4.\"
5.\" Redistribution and use in source and binary forms, with or without
6.\" modification, are permitted provided that the following conditions
7.\" are met:
8.\"
9.\" 1. Redistributions of source code must retain the above copyright
10.\"    notice, this list of conditions and the following disclaimer.
11.\" 2. Redistributions in binary form must reproduce the above copyright
12.\"    notice, this list of conditions and the following disclaimer in
13.\"    the documentation and/or other materials provided with the
14.\"    distribution.
15.\" 3. Neither the name of The DragonFly Project nor the names of its
16.\"    contributors may be used to endorse or promote products derived
17.\"    from this software without specific, prior written permission.
18.\"
19.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
22.\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE
23.\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
24.\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING,
25.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
26.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
27.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
29.\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
30.\" SUCH DAMAGE.
31.\"
32.Dd September 21, 2015
33.Dt HAMMER 5
34.Os
35.Sh NAME
36.Nm HAMMER
37.Nd HAMMER file system
38.Sh SYNOPSIS
39To compile this driver into the kernel,
40place the following line in your
41kernel configuration file:
42.Bd -ragged -offset indent
43.Cd "options HAMMER"
44.Ed
45.Pp
46Alternatively, to load the driver as a
47module at boot time, place the following line in
48.Xr loader.conf 5 :
49.Bd -literal -offset indent
50hammer_load="YES"
51.Ed
52.Pp
53To mount via
54.Xr fstab 5 :
55.Bd -literal -offset indent
56/dev/ad0s1d[:/dev/ad1s1d:...]	/mnt hammer rw 2 0
57.Ed
58.Sh DESCRIPTION
59The
60.Nm
61file system provides facilities to store file system data onto disk devices
62and is intended to replace
63.Xr ffs 5
64as the default file system for
65.Dx .
66.Pp
67Among its features are instant crash recovery,
68large file systems spanning multiple volumes,
69data integrity checking,
70data deduplication,
71fine grained history retention and snapshots,
72pseudo-filesystems (PFSs),
73mirroring capability and
74unlimited number of files and links.
75.Pp
76All functions related to managing
77.Nm
78file systems are provided by the
79.Xr newfs_hammer 8 ,
80.Xr mount_hammer 8 ,
81.Xr hammer 8 ,
82.Xr sysctl 8 ,
83.Xr chflags 1 ,
84and
85.Xr undo 1
86utilities.
87.Pp
88For a more detailed introduction refer to the paper and slides listed in the
89.Sx SEE ALSO
90section.
91For some common usages of
92.Nm
93see the
94.Sx EXAMPLES
95section below.
96.Pp
97Description of
98.Nm
99features:
100.Ss Instant Crash Recovery
101After a non-graceful system shutdown,
102.Nm
103file systems will be brought back into a fully coherent state
104when mounting the file system, usually within a few seconds.
105.Pp
106In the unlikely case
107.Nm
108mount fails due redo recovery (stage 2 recovery) being corrupted, a
109workaround to skip this stage can be applied by setting the following tunable:
110.Bd -literal -offset indent
111vfs.hammer.skip_redo=<value>
112.Ed
113.Pp
114Possible values are:
115.Bl -tag -width indent
116.It 0
117Run redo recovery normally and fail to mount in the case of error (default).
118.It 1
119Run redo recovery but continue mounting if an error appears.
120.It 2
121Completely bypass redo recovery.
122.El
123.Pp
124Related commands:
125.Xr mount_hammer 8
126.Ss Large File Systems & Multi Volume
127A
128.Nm
129file system can be up to 1 Exabyte in size.
130It can span up to 256 volumes,
131each volume occupies a
132.Dx
133disk slice or partition, or another special file,
134and can be up to 4096 TB in size.
135Minimum recommended
136.Nm
137file system size is 50 GB.
138For volumes over 2 TB in size
139.Xr gpt 8
140and
141.Xr disklabel64 8
142normally need to be used.
143.Pp
144Related
145.Xr hammer 8
146commands:
147.Cm volume-add ,
148.Cm volume-del ,
149.Cm volume-list ,
150.Cm volume-blkdevs ;
151see also
152.Xr newfs_hammer 8
153.Ss Data Integrity Checking
154.Nm
155has high focus on data integrity,
156CRC checks are made for all major structures and data.
157.Nm
158snapshots implements features to make data integrity checking easier:
159The atime and mtime fields are locked to the ctime
160for files accessed via a snapshot.
161The
162.Fa st_dev
163field is based on the PFS
164.Ar shared-uuid
165and not on any real device.
166This means that archiving the contents of a snapshot with e.g.\&
167.Xr tar 1
168and piping it to something like
169.Xr md5 1
170will yield a consistent result.
171The consistency is also retained on mirroring targets.
172.Ss Data Deduplication
173To save disk space data deduplication can be used.
174Data deduplication will identify data blocks which occur multiple times
175and only store one copy, multiple reference will be made to this copy.
176.Pp
177Related
178.Xr hammer 8
179commands:
180.Cm dedup ,
181.Cm dedup-simulate ,
182.Cm cleanup ,
183.Cm config
184.Ss Transaction IDs
185The
186.Nm
187file system uses 64-bit transaction ids to refer to historical
188file or directory data.
189Transaction ids used by
190.Nm
191are monotonically increasing over time.
192In other words:
193when a transaction is made,
194.Nm
195will always use higher transaction ids for following transactions.
196A transaction id is given in hexadecimal format
197.Li 0x016llx ,
198such as
199.Li 0x00000001061a8ba6 .
200.Pp
201Related
202.Xr hammer 8
203commands:
204.Cm snapshot ,
205.Cm snap ,
206.Cm snaplo ,
207.Cm snapq ,
208.Cm snapls ,
209.Cm synctid
210.Ss History & Snapshots
211History metadata on the media is written with every sync operation, so that
212by default the resolution of a file's history is 30-60 seconds until the next
213prune operation.
214Prior versions of files and directories are generally accessible by appending
215.Ql @@
216and a transaction id to the name.
217The common way of accessing history, however, is by taking snapshots.
218.Pp
219Snapshots are softlinks to prior versions of directories and their files.
220Their data will be retained across prune operations for as long as the
221softlink exists.
222Removing the softlink enables the file system to reclaim the space
223again upon the next prune & reblock operations.
224In
225.Nm
226Version 3+ snapshots are also maintained as file system meta-data.
227.Pp
228Related
229.Xr hammer 8
230commands:
231.Cm cleanup ,
232.Cm history ,
233.Cm snapshot ,
234.Cm snap ,
235.Cm snaplo ,
236.Cm snapq ,
237.Cm snaprm ,
238.Cm snapls ,
239.Cm config ,
240.Cm viconfig ;
241see also
242.Xr undo 1
243.Ss Pruning & Reblocking
244Pruning is the act of deleting file system history.
245By default only history used by the given snapshots
246and history from after the latest snapshot will be retained.
247By setting the per PFS parameter
248.Cm prune-min ,
249history is guaranteed to be saved at least this time interval.
250All other history is deleted.
251Reblocking will reorder all elements and thus defragment the file system and
252free space for reuse.
253After pruning a file system must be reblocked to recover all available space.
254Reblocking is needed even when using the
255.Cm nohistory
256.Xr mount_hammer 8
257option or
258.Xr chflags 1
259flag.
260.Pp
261Related
262.Xr hammer 8
263commands:
264.Cm cleanup ,
265.Cm snapshot ,
266.Cm prune ,
267.Cm prune-everything ,
268.Cm rebalance ,
269.Cm reblock ,
270.Cm reblock-btree ,
271.Cm reblock-inodes ,
272.Cm reblock-dirs ,
273.Cm reblock-data
274.Ss Pseudo-Filesystems (PFSs)
275A pseudo-filesystem, PFS for short, is a sub file system in a
276.Nm
277file system.
278Each PFS has independent inode numbers.
279All disk space in a
280.Nm
281file system is shared between all PFSs in it,
282so each PFS is free to use all remaining space.
283A
284.Nm
285file system supports up to 65536 PFSs.
286The root of a
287.Nm
288file system is PFS# 0, it is called the root PFS and is always a master PFS.
289.Pp
290A PFS can be either master or slave.
291Slaves are always read-only,
292so they can't be updated by normal file operations, only by
293.Xr hammer 8
294operations like mirroring and pruning.
295Upgrading slaves to masters and downgrading masters to slaves are supported.
296.Pp
297It is recommended to use a
298.Nm null
299mount to access a PFS, except for root PFS;
300this way no tools are confused by the PFS root being a symlink
301and inodes not being unique across a
302.Nm
303file system.
304.Pp
305Many
306.Xr hammer 8
307operations operates per PFS,
308this includes mirroring, offline deduping, pruning, reblocking and rebalancing.
309.Pp
310Related
311.Xr hammer 8
312commands:
313.Cm pfs-master ,
314.Cm pfs-slave ,
315.Cm pfs-status ,
316.Cm pfs-update ,
317.Cm pfs-destroy ,
318.Cm pfs-upgrade ,
319.Cm pfs-downgrade ;
320see also
321.Xr mount_null 8
322.Ss Mirroring
323Mirroring is copying of all data in a file system, including snapshots
324and other historical data.
325In order to allow inode numbers to be duplicated on the slaves
326.Nm
327mirroring feature uses PFSs.
328A master or slave PFS can be mirrored to a slave PFS.
329I.e.\& for mirroring multiple slaves per master are supported,
330but multiple masters per slave are not.
331.Nm
332does not support multi-master clustering and mirroring.
333.Pp
334Related
335.Xr hammer 8
336commands:
337.Cm mirror-copy ,
338.Cm mirror-stream ,
339.Cm mirror-read ,
340.Cm mirror-read-stream ,
341.Cm mirror-write ,
342.Cm mirror-dump
343.Ss Fsync Flush Modes
344The
345.Nm
346file system implements several different
347.Fn fsync
348flush modes, the mode used is set via the
349.Va vfs.hammer.flush_mode
350sysctl, see
351.Xr hammer 8
352for details.
353.Ss Unlimited Number of Files and Links
354There is no limit on the number of files or links in a
355.Nm
356file system, apart from available disk space.
357.Ss NFS Export
358.Nm
359file systems support NFS export.
360NFS export of PFSs is done using
361.Nm null
362mounts (for file/directory in root PFS
363.Nm null
364mount is not needed).
365For example, to export the PFS
366.Pa /hammer/pfs/data ,
367create a
368.Nm null
369mount, e.g.\& to
370.Pa /hammer/data
371and export the latter path.
372.Pp
373Don't export a directory containing a PFS (e.g.\&
374.Pa /hammer/pfs
375above).
376Only
377.Nm null
378mount for PFS root
379(e.g.\&
380.Pa /hammer/data
381above) should be exported (subdirectory may be escaped if exported).
382.Ss File System Versions
383As new features have been introduced to
384.Nm
385a version number has been bumped.
386Each
387.Nm
388file system has a version, which can be upgraded to support new features.
389.Pp
390Related
391.Xr hammer 8
392commands:
393.Cm version ,
394.Cm version-upgrade ;
395see also
396.Xr newfs_hammer 8
397.Sh EXAMPLES
398.Ss Preparing the File System
399To create and mount a
400.Nm
401file system use the
402.Xr newfs_hammer 8
403and
404.Xr mount_hammer 8
405commands.
406Note that all
407.Nm
408file systems must have a unique name on a per-machine basis.
409.Bd -literal -offset indent
410newfs_hammer -L HOME /dev/ad0s1d
411mount_hammer /dev/ad0s1d /home
412.Ed
413.Pp
414Similarly, multi volume file systems can be created and mounted by
415specifying additional arguments.
416.Bd -literal -offset indent
417newfs_hammer -L MULTIHOME /dev/ad0s1d /dev/ad1s1d
418mount_hammer /dev/ad0s1d /dev/ad1s1d /home
419.Ed
420.Pp
421Once created and mounted,
422.Nm
423file systems need periodic clean up making snapshots, pruning and reblocking,
424in order to have access to history and file system not to fill up.
425For this it is recommended to use the
426.Xr hammer 8
427.Cm cleanup
428metacommand.
429.Pp
430By default,
431.Dx
432is set up to run
433.Nm hammer Cm cleanup
434nightly via
435.Xr periodic 8 .
436.Pp
437It is also possible to perform these operations individually via
438.Xr crontab 5 .
439For example, to reblock the
440.Pa /home
441file system every night at 2:15 for up to 5 minutes:
442.Bd -literal -offset indent
44315 2 * * * hammer -c /var/run/HOME.reblock -t 300 reblock /home \e
444	>/dev/null 2>&1
445.Ed
446.Ss Snapshots
447The
448.Xr hammer 8
449utility's
450.Cm snapshot
451command provides several ways of taking snapshots.
452They all assume a directory where snapshots are kept.
453.Bd -literal -offset indent
454mkdir /snaps
455hammer snapshot /home /snaps/snap1
456(...after some changes in /home...)
457hammer snapshot /home /snaps/snap2
458.Ed
459.Pp
460The softlinks in
461.Pa /snaps
462point to the state of the
463.Pa /home
464directory at the time each snapshot was taken, and could now be used to copy
465the data somewhere else for backup purposes.
466.Pp
467By default,
468.Dx
469is set up to create nightly snapshots of all
470.Nm
471file systems via
472.Xr periodic 8
473and to keep them for 60 days.
474.Ss Pruning
475A snapshot directory is also the argument to the
476.Xr hammer 8
477.Cm prune
478command which frees historical data from the file system that is not
479pointed to by any snapshot link and is not from after the latest snapshot
480and is older than
481.Cm prune-min .
482.Bd -literal -offset indent
483rm /snaps/snap1
484hammer prune /snaps
485.Ed
486.Ss Mirroring
487Mirroring is set up using
488.Nm
489pseudo-filesystems (PFSs).
490To associate the slave with the master its shared UUID should be set to
491the master's shared UUID as output by the
492.Nm hammer Cm pfs-master
493command.
494.Bd -literal -offset indent
495hammer pfs-master /home/pfs/master
496hammer pfs-slave /home/pfs/slave shared-uuid=<master's shared uuid>
497.Ed
498.Pp
499The
500.Pa /home/pfs/slave
501link is unusable for as long as no mirroring operation has taken place.
502.Pp
503To mirror the master's data, either pipe a
504.Cm mirror-read
505command into a
506.Cm mirror-write
507or, as a short-cut, use the
508.Cm mirror-copy
509command (which works across a
510.Xr ssh 1
511connection as well).
512Initial mirroring operation has to be done to the PFS path (as
513.Xr mount_null 8
514can't access it yet).
515.Bd -literal -offset indent
516hammer mirror-copy /home/pfs/master /home/pfs/slave
517.Ed
518.Pp
519It is also possible to have the target PFS auto created
520by just issuing the same
521.Cm mirror-copy
522command, if the target PFS doesn't exist you will be prompted
523if you would like to create it.
524You can even omit the prompting by using the
525.Fl y
526flag:
527.Bd -literal -offset indent
528hammer -y mirror-copy /home/pfs/master /home/pfs/slave
529.Ed
530.Pp
531After this initial step
532.Nm null
533mount can be setup for
534.Pa /home/pfs/slave .
535Further operations can use
536.Nm null
537mounts.
538.Bd -literal -offset indent
539mount_null /home/pfs/master /home/master
540mount_null /home/pfs/slave /home/slave
541
542hammer mirror-copy /home/master /home/slave
543.Ed
544.Ss NFS Export
545To NFS export from the
546.Nm
547file system
548.Pa /hammer
549the directory
550.Pa /hammer/non-pfs
551without PFSs, and the PFS
552.Pa /hammer/pfs/data ,
553the latter is
554.Nm null
555mounted to
556.Pa /hammer/data .
557.Pp
558Add to
559.Pa /etc/fstab
560(see
561.Xr fstab 5 ) :
562.Bd -literal -offset indent
563/hammer/pfs/data /hammer/data null rw
564.Ed
565.Pp
566Add to
567.Pa /etc/exports
568(see
569.Xr exports 5 ) :
570.Bd -literal -offset indent
571/hammer/non-pfs
572/hammer/data
573.Ed
574.Sh DIAGNOSTICS
575.Bl -diag
576.It "hammer: System has insuffient buffers to rebalance the tree.  nbuf < %d"
577Rebalancing a
578.Nm
579PFS uses quite a bit of memory and
580can't be done on low memory systems.
581It has been reported to fail on 512MB systems.
582Rebalancing isn't critical for
583.Nm
584file system operation;
585it is done by
586.Nm hammer
587.Cm rebalance ,
588often as part of
589.Nm hammer
590.Cm cleanup .
591.El
592.Sh SEE ALSO
593.Xr chflags 1 ,
594.Xr md5 1 ,
595.Xr tar 1 ,
596.Xr undo 1 ,
597.Xr exports 5 ,
598.Xr ffs 5 ,
599.Xr fstab 5 ,
600.Xr disklabel64 8 ,
601.Xr gpt 8 ,
602.Xr hammer 8 ,
603.Xr mount_hammer 8 ,
604.Xr mount_null 8 ,
605.Xr newfs_hammer 8 ,
606.Xr periodic 8 ,
607.Xr sysctl 8
608.Rs
609.%A Matthew Dillon
610.%D June 2008
611.%O http://www.dragonflybsd.org/hammer/hammer.pdf
612.%T "The HAMMER Filesystem"
613.Re
614.Rs
615.%A Matthew Dillon
616.%D October 2008
617.%O http://www.dragonflybsd.org/presentations/nycbsdcon08/
618.%T "Slideshow from NYCBSDCon 2008"
619.Re
620.Rs
621.%A Michael Neumann
622.%D January 2010
623.%O http://www.ntecs.de/talks/HAMMER.pdf
624.%T "Slideshow for a presentation held at KIT (http://www.kit.edu)"
625.Re
626.Sh FILESYSTEM PERFORMANCE
627The
628.Nm
629file system has a front-end which processes VNOPS and issues necessary
630block reads from disk, and a back-end which handles meta-data updates
631on-media and performs all meta-data write operations.
632Bulk file write operations are handled by the front-end.
633Because
634.Nm
635defers meta-data updates virtually no meta-data read operations will be
636issued by the frontend while writing large amounts of data to the file system
637or even when creating new files or directories, and even though the
638kernel prioritizes reads over writes the fact that writes are cached by
639the drive itself tends to lead to excessive priority given to writes.
640.Pp
641There are four bioq sysctls, shown below with default values,
642which can be adjusted to give reads a higher priority:
643.Bd -literal -offset indent
644kern.bioq_reorder_minor_bytes: 262144
645kern.bioq_reorder_burst_bytes: 3000000
646kern.bioq_reorder_minor_interval: 5
647kern.bioq_reorder_burst_interval: 60
648.Ed
649.Pp
650If a higher read priority is desired it is recommended that the
651.Va kern.bioq_reorder_minor_interval
652be increased to 15, 30, or even 60, and the
653.Va kern.bioq_reorder_burst_bytes
654be decreased to 262144 or 524288.
655.Sh HISTORY
656The
657.Nm
658file system first appeared in
659.Dx 1.11 .
660.Sh AUTHORS
661.An -nosplit
662The
663.Nm
664file system was designed and implemented by
665.An Matthew Dillon Aq Mt dillon@backplane.com ,
666data deduplication was added by
667.An Ilya Dryomov .
668This manual page was written by
669.An Sascha Wildner
670and updated by
671.An Thomas Nikolajsen .
672