xref: /dragonfly/sbin/hammer2/hammer2.8 (revision 9afa2da7)
1.\" Copyright (c) 2015 The DragonFly Project.  All rights reserved.
2.\"
3.\" This code is derived from software contributed to The DragonFly Project
4.\" by Matthew Dillon <dillon@backplane.com>
5.\"
6.\" Redistribution and use in source and binary forms, with or without
7.\" modification, are permitted provided that the following conditions
8.\" are met:
9.\"
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. Redistributions in binary form must reproduce the above copyright
13.\"    notice, this list of conditions and the following disclaimer in
14.\"    the documentation and/or other materials provided with the
15.\"    distribution.
16.\" 3. Neither the name of The DragonFly Project nor the names of its
17.\"    contributors may be used to endorse or promote products derived
18.\"    from this software without specific, prior written permission.
19.\"
20.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
21.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
22.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
23.\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE
24.\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
25.\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING,
26.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
27.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
28.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
29.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
30.\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31.\" SUCH DAMAGE.
32.\"
33.Dd August 19, 2019
34.Dt HAMMER2 8
35.Os
36.Sh NAME
37.Nm hammer2
38.Nd hammer2 file system utility
39.Sh SYNOPSIS
40.Nm
41.Fl h
42.Nm
43.Op Fl s Ar path
44.Op Fl t Ar type
45.Op Fl u Ar uuid
46.Op Fl m Ar mem
47.Ar command
48.Op Ar argument ...
49.Sh DESCRIPTION
50The
51.Nm
52utility provides miscellaneous support functions for a
53HAMMER2 file system.
54.Pp
55The options are as follows:
56.Bl -tag -width indent
57.It Fl s Ar path
58Specify the path to a mounted HAMMER2 filesystem.
59At least one PFS on a HAMMER2 filesystem must be mounted for the system
60to act on all PFSs managed by it.
61Every HAMMER2 filesystem typically has a PFS called "LOCAL" for this purpose.
62.It Fl t Ar type
63Specify the type when creating, upgrading, or downgrading a PFS.
64Supported types are MASTER, SLAVE, SOFT_MASTER, SOFT_SLAVE, CACHE, and DUMMY.
65If not specified the pfs-create directive will default to MASTER if no
66uuid is specified, and SLAVE if a uuid is specified.
67.It Fl u Ar uuid
68Specify the cluster uuid when creating a PFS.  If not specified, a unique,
69random uuid will be generated.
70Note that every PFS also has a unique pfs_id which is always generated
71and cannot be overridden with an option.
72The { pfs_clid, pfs_fsid } tuple uniquely identifies a component of a cluster.
73.It Fl m Ar mem
74Specify how much tracking memory to use for certain directives.
75At the moment, this option is only applicable to the
76.Cm bulkfree
77directive, allowing it to operate in fewer passes when given more memory.
78A nominal value for a 4TB drive with a ton of stuff on it would be around
79a gigabyte '-m 1g'.
80.El
81.Pp
82.Nm
83directives are as shown below.
84Note that most directives require you to either be CD'd into a hammer2
85filesystem, specify a path to a mounted hammer2 filesystem via the
86.Fl s
87option, or specify a path after the directive.
88It depends on the directive.
89All hammer2 filesystem have a PFS called "LOCAL" which is typically mounted
90locally on the host in order to be able to issue commands for other PFSs
91on the filesystem.
92The mount also enables PFS configuration scanning for that filesystem.
93.Bl -tag -width indent
94.\" ==== cleanup ====
95.It Cm cleanup Op path
96Perform manual cleanup passes on paths or all mounted partitions.
97.\" ==== connect ====
98.It Cm connect Ar target
99Add a cluster link entry to the volume header.
100The volume header can support up to 255 link entries.
101This feature is not currently used.
102.\" ==== destroy ====
103.It Cm destroy Ar path...
104Destroy the specified directory entry in a hammer2 filesystem.  This bypasses
105all normal checks and will unconditionally destroy the directory entry.
106The underlying inode is not checked and, if it does exist, its nlinks count
107is not decremented.
108This directive should only be used to destroy a corrupted directory entry
109which no longer has a working inode.
110.Pp
111Note that this command may desynchronize the system namecache for the
112specified entry.  If this happens, you may have to unmount and remount the
113filesystem.
114.\" ==== destroy-inum ====
115.It Cm destroy-inum Ar path...
116Destroy the specified inode in a hammer2 filesystem.
117.\" ==== disconnect ====
118.It Cm disconnect Ar target
119Delete a cluster link entry from the volume header.
120This feature is not currently used.
121.\" ==== info ====
122.It Cm info Op devpath...
123Access and print the status and super-root entries for all HAMMER2
124partitions found in /dev/serno or the specified device path(s).
125The partitions do not have to be mounted.
126Note that only mounted partitions will be under active management.
127This is accomplished by mounting at least one PFS within the partition.
128Typically at least the @LOCAL PFS is mounted.
129.\" ==== mountall ====
130.It Cm mountall Op devpath...
131This directive mounts the @LOCAL PFS on all HAMMER2 partitions found
132in /dev/serno, or the specified device path(s).
133The partitions are mounted as /var/hammer2/LOCAL.<id>.
134Mounts are executed in the background and this command will wait a
135limited amount of time for the mounts to complete before returning.
136.\" ==== status ====
137.It Cm status Op path...
138Dump a list of all cluster link entries configured in the volume header.
139.\" ==== hash ====
140.It Cm hash Op filename...
141Compute and print the directory hash for any number of filenames.
142.\" ==== dhash ====
143.It Cm dhash Op filename...
144Compute and print the data hash for long directory entry for any number of filenames.
145.\" ==== pfs-list ====
146.It Cm pfs-list Op path...
147List all local PFSs available on a mounted HAMMER2 filesystem, their type,
148and their current status.
149You must mount at least one PFS in order to be able to access the whole list.
150.\" ==== pfs-clid ====
151.It Cm pfs-clid Ar label
152Print the cluster id for a PFS specified by name.
153.\" ==== pfs-fsid ====
154.It Cm pfs-fsid Ar label
155Print the unique filesystem id for a PFS specified by name.
156.\" ==== pfs-create ====
157.It Cm pfs-create Ar label
158Create a local PFS on a mounted HAMMER2 filesystem.
159If no uuid is specified the pfs-type defaults to MASTER.
160If a uuid is specified via the
161.Fl u
162option the pfs-type defaults to SLAVE.
163Other types can be specified with the
164.Fl t
165option.
166.Pp
167If you wish to add a MASTER to an existing cluster, you must first add it as
168a SLAVE and then upgrade it to MASTER to properly synchronize it.
169.Pp
170The DUMMY pfs-type is used to tie network-accessible clusters into the local
171machine when no local storage is desired.
172This type should be used on minimal H2 partitions or entirely in ram for
173netboot-centric systems to provide a tie-in point for the mount command,
174or on more complex systems where you need to also access network-centric
175clusters.
176.Pp
177The CACHE or SLAVE pfs-type is typically used when the main store is on
178the network but local storage is desired to improve performance.
179SLAVE is also used when a backup is desired.
180.Pp
181Generally speaking, you can mount any PFS element of a cluster in order to
182access the cluster via the full cluster protocol.
183There are two exceptions.
184If you mount a SOFT_SLAVE or a SOFT_MASTER then soft quorum semantics are
185employed... the soft slave or soft master's current state will always be used
186and the quorum protocol will not be used.  The soft PFS will still be
187synchronized to masters in the background when available.
188Also, you can use
189.Sq mount -o local
190to mount ONLY a local HAMMER2 PFS and
191not run any network or quorum protocols for the mount.
192All such mounts except for a SOFT_MASTER mount will be read-only.
193Other than that, you will be mounting the whole cluster when you mount any
194PFS within the cluster.
195.Pp
196DUMMY - Create a PFS skeleton intended to be the mount point for a
197more complex cluster, probably one that is entirely network based.
198No data will be synchronized to this PFS so it is suitable for use
199in a network boot image or memory filesystem.
200This allows you to create placeholders for mount points on your local
201disk, SSD, or memory disk.
202.Pp
203CACHE - Create a PFS for caching portions of the cluster piecemeal.
204This is similar to a SLAVE but does not synchronize the entire contents of
205the cluster to the PFS.
206Elements found in the CACHE PFS which are validated against the cluster
207will be read, presumably a faster access than having to go to the cluster.
208Only local CACHEs will be updated.
209Network-accessible CACHE PFSs might be read but will not be written to.
210If you have a large hard-drive-based cluster you can set up localized
211SSD CACHE PFSs to improve performance.
212.Pp
213SLAVE - Create a PFS which maintains synchronization with and provides a
214read-only copy of the cluster.
215HAMMER2 will prioritize local SLAVEs for data retrieval after validating
216their transaction id against the cluster.
217The difference between a CACHE and a SLAVE is that the SLAVE is synchronized
218to a full copy of the cluster and thus can serve as a backup or be staged
219for use as a MASTER later on.
220.Pp
221SOFT_SLAVE - Create a PFS which maintains synchronization with and provides
222a read-only copy of the cluster.
223This is one of the special mount cases.  A SOFT_SLAVE will synchronize with
224the cluster when the cluster is available, but can still be accessed when
225the cluster is not available.
226.Pp
227MASTER - Create a PFS which will hold a master copy of the cluster.
228If you create several MASTER PFSs with the same cluster id you are
229effectively creating a multi-master cluster and causing a quorum and
230cache coherency protocol to be used to validate operations.
231The total number of masters is stored in each PFSs making up the cluster.
232Filesystem operations will stall for normal mounts if a quorum cannot be
233obtained to validate the operation.
234MASTER nodes which go offline and return later will synchronize in the
235background.
236Note that when adding a MASTER to an existing cluster you must add the
237new PFS as a SLAVE and then upgrade it to a MASTER.
238.Pp
239SOFT_MASTER - Create a PFS which maintains synchronization with and provides
240a read-write copy of the cluster.
241This is one of the special mount cases.  A SOFT_MASTER will synchronize with
242the cluster when the cluster is available, but can still be read AND written
243to even when the cluster is not available.
244Modifications made to a SOFT_MASTER will be automatically flushed to the
245cluster when it becomes accessible again, and vise-versa.
246Manual intervention may be required if a conflict occurs during
247synchronization.
248.\" ==== pfs-delete ====
249.It Cm pfs-delete Ar label
250Delete a local PFS on a mounted HAMMER2 filesystem.
251Deleting a PFS of type MASTER requires first downgrading it to a SLAVE (XXX).
252.\" ==== snapshot ====
253.It Cm snapshot Ar path Op label
254Create a snapshot of a directory.
255This can only be used on a local PFS, and is only really useful if the PFS
256contains a complete copy of what you desire to snapshot so that typically
257means a local MASTER, SOFT_MASTER, SLAVE, or SOFT_SLAVE must be present.
258Snapshots are created simply by flushing a PFS mount to disk and then copying
259the directory inode to the PFS.
260The topology is snapshotted without having to be copied or scanned.
261Snapshots are effectively separate from the cluster they came from
262and can be used as a starting point for a new cluster.
263So unless you build a new cluster from the snapshot, it will stay local
264to the machine it was made on.
265.\" ==== snapshot-debug ====
266.It Cm snapshot-debug Ar path Op label
267Snapshot without filesystem sync.
268.\" ==== service ====
269.It Cm service
270Start the
271.Nm
272service daemon.
273This daemon is also automatically started when you run
274.Xr mount_hammer2 8 .
275The hammer2 service daemon handles incoming TCP connections and maintains
276outgoing TCP connections.  It will interconnect available services on the
277machine (e.g. hammer2 mounts and xdisks) to the network.
278.\" ==== stat ====
279.It Cm stat Op path...
280Print the inode statistics, compression, and other meta-data associated
281with a list of paths.
282.\" ==== leaf ====
283.It Cm leaf
284XXX
285.\" ==== shell ====
286.It Cm shell Op host
287Start a debug shell to the local hammer2 service daemon via the DMSG protocol.
288.\" ==== debugspan ====
289.It Cm debugspan Ar target
290(do not use)
291.\" ==== rsainit ====
292.It Cm rsainit Op path
293Create the
294.Pa /etc/hammer2
295directory and initialize a public/private keypair in that directory for
296use by the network cluster protocols.
297.\" ==== show ====
298.It Cm show Ar devpath
299Dump the radix tree for the HAMMER2 filesystem by scanning a
300block device directly.  No mount is required.
301.\" ==== freemap ====
302.It Cm freemap Ar devpath
303Dump the freemap tree for the HAMMER2 filesystem by scanning a
304block device directly.  No mount is required.
305.\" ==== setcomp ====
306.It Cm setcomp Ar mode[:level] Ar path...
307Set the compression mode as specified for any newly created elements at or
308under the path if not overridden by deeper elements.
309Available modes are none, autozero, lz4, or zlib.
310When zlib is used the compression level can be set.
311The default will be 6 which is the best trade-off between performance and
312time.
313.Pp
314newfs_hammer2 will set the default compression to lz4 which prioritizes
315speed over performance.
316Also note that HAMMER2 contains a heuristic and will not attempt to
317compress every block if it detects a sufficient amount of uncompressable
318data.
319.Pp
320Hammer2 compression is only effective when it can reduce the size of dataset
321(typically a 64KB block) by one or more powers of 2.  A 64K block which
322only compresses to 40K will not yield any storage improvement.
323.Pp
324Generally speaking you do not want to set the compression mode to
325.Sq none ,
326as this will cause blocks of all-zeros to be written as all-zero blocks,
327instead of holes.  The
328.Sq autozero
329compression mode detects blocks of all-zeros
330and writes them as holes.  However, HAMMER2 will rewrite data in-place if
331the compression mode is set to
332.Sq none
333and the check code is set to
334.Sq  disabled .
335Formal snapshots will still snapshot such files.  However,
336de-duplication will no longer function on the data blocks.
337.\" ==== setcheck ====
338.It Cm setcheck Ar check Ar path...
339Set the check code as specified for any newly created elements at or under
340the path if not overridden by deeper elements.
341Available codes are default, disabled, crc32, xxhash64, or sha192.
342.\" ==== clrcheck ====
343.It Cm clrcheck Op path...
344Clear the check code override for the specified paths.
345Overrides may still be present in deeper elements.
346.\" ==== setcrc32 ====
347.It Cm setcrc32 Op path...
348Set the check code to the ISCSI 32-bit CRC for any newly created elements
349at or under the path if not overridden by deeper elements.
350.\" ==== setxxhash64 ====
351.It Cm setxxhash64 Op path...
352Set the check code to XXHASH64, a fast 64-bit hash
353.\" ==== setsha192 ====
354.It Cm setsha192 Op path...
355Set the check code to SHA192 for any newly created elements at or under
356the path if not overridden by deeper elements.
357.\" ==== bulkfree ====
358.It Cm bulkfree Ar path
359Run a bulkfree pass on a HAMMER2 mount.
360You can specify any PFS for the mount, the bulkfree pass is run on the
361entire partition.
362Note that it takes two passes to actually free space.
363By default this directive will use up to 1/16 physical memory to track
364the freemap.  The amount of memory used may be overridden with the
365.Op Fl m Ar mem
366option.
367.\" ==== printinode ====
368.It Cm printinode Ar path
369Dump inode.
370.\" ==== dumpchain ====
371.It Cm dumpchain Op path Op chnflags
372Dump in-memory chain topology.
373.El
374.Sh SYSCTLS
375.Bl -tag -width indent
376.It Va vfs.hammer2.dedup_enable (default on)
377Enables live de-duplication.  Any recently read data that is on-media
378(already synchronized to media) is tested against pending writes for
379compatibility.  If a match is found, the write will reference the
380existing on-media data instead of writing new data.
381.It Va vfs.hammer2.always_compress (default off)
382This disables the H2 compression heuristic and forces H2 to always
383try to compress data blocks, even if they look uncompressable.
384Enabling this option reduces performance but has higher de-duplication
385repeatability.
386.It Va vfs.hammer2.cluster_data_read (default 4)
387.It Va vfs.hammer2.cluster_meta_read (default 1)
388Set the amount of read-ahead clustering to perform on data and meta-data
389blocks.
390.It Va vfs.hammer2.cluster_write (default 4)
391Set the amount of write-behind clustering to perform in buffers.  Each
392buffer represents 64KB.  The default is 4 and higher values typically do
393not improve performance.  A value of 0 disables clustered writes.
394This variable applies to the underlying media device, not to logical
395file writes, so it should not interfere with temporary file optimization.
396Generally speaking you want this enabled to generate smoothly pipelined
397writes to the media.
398.It Va vfs.hammer2.bulkfree_tps (default 5000)
399Set bulkfree's maximum scan rate.  This is primarily intended to limit
400I/O utilization on SSDs and cpu utilization when the meta-data is mostly
401cached in memory.
402.El
403.Sh SETTING UP /etc/hammer2
404The
405.Sq rsainit
406directive will create the
407.Pa /etc/hammer2
408directory with appropriate permissions and also generate a public key
409pair in this directory for the machine.  These files will be
410.Pa rsa.pub
411and
412.Pa rsa.prv
413and needless to say, the private key shouldn't leave the host.
414.Pp
415The service daemon will also scan the
416.Pa /etc/hammer2/autoconn
417file which contains a list of hosts which it will automatically maintain
418connections to to form your cluster.
419The service daemon will automatically reconnect on any failure and will
420also monitor the file for changes.
421.Pp
422When the service daemon receives a connection it expects to find a
423public key for that connection in a file in
424.Pa /etc/hammer2/remote/
425called
426.Pa <IPADDR>.pub .
427You normally copy the
428.Pa rsa.pub
429key from the host in question to this file.
430The IP address must match exactly or the connection will not be allowed.
431.Pp
432If you want to use an unencrypted connection you can create empty,
433dummy files in the remote directory in the form
434.Pa <IPADDR>.none .
435We do not recommend using unencrypted connections.
436.Sh CLUSTER SERVICES
437Currently there are two services which use the cluster network infrastructure,
438HAMMER2 mounts and XDISK.
439Any HAMMER2 mount will make all PFSs for that filesystem available to the
440cluster.
441And if the XDISK kernel module is loaded, the hammer2 service daemon will make
442your machine's block devices available to the cluster (you must load the
443xdisk.ko kernel module before starting the hammer2 service).
444They will show up as
445.Pa /dev/xa*
446and
447.Pa /dev/serno/*
448devices on the remote machines making up the cluster.
449Remote block devices are just what they appear to be... direct access to a
450block device on a remote machine.  If the link goes down remote accesses
451will stall until it comes back up again, then automatically requeue any
452pending I/O and resume as if nothing happened.
453However, if the server hosting the physical disks crashes or is rebooted,
454any remote opens to its devices will see a permanent I/O failure requiring a
455close and open sequence to re-establish.
456The latter is necessary because the server's drives might not have committed
457the data before the crash, but had already acknowledged the transfer.
458.Pp
459Data commits work exactly the same as they do for real block devices.
460The originater must issue a BUF_CMD_FLUSH.
461.Sh ADDING A NEW MASTER TO A CLUSTER
462When you
463.Xr newfs_hammer2 8
464a HAMMER2 filesystem or use the
465.Sq pfs-create
466directive on one already mounted
467to create a new PFS, with no special options, you wind up with a PFS
468typed as a MASTER and a unique cluster uuid, but because there is only one
469PFS for that cluster (for each PFS you create via pfs-create), it will
470act just like a normal filesystem would act and does not require any special
471protocols to operate.
472.Pp
473If you use the
474.Sq pfs-create
475directive along with the
476.Fl u
477option to specify a cluster uuid that already exists in the cluster,
478you are adding a PFS to an existing cluster and this can trigger a whole
479series of events in the background.
480When you specify the
481.Fl u
482option in a
483.Sq pfs-create ,
484.Nm
485will by default create a SLAVE PFS.
486In fact, this is what must be created first even if you want to add a new
487MASTER to your cluster.
488.Pp
489The most common action a system admin will want to take is to upgrade or
490downgrade a PFS.
491A new MASTER can be added to the cluster by upgrading an existing SLAVE
492to MASTER.
493A MASTER can be removed from the cluster by downgrading it to a SLAVE.
494Upgrades and downgrades will put nodes in the cluster in a transition state
495until the operation is complete.
496For downgrades the transition state is fleeting unless one or more other
497masters has not acknowledged the change.
498For upgrades a background synchronization process must complete before the
499transition can be said to be complete, and the node remains (really) a SLAVE
500until that transition is complete.
501.Sh USE CASES FOR A SOFT_MASTER
502The SOFT_MASTER PFS type is a special type which must be specifically
503mounted by a machine.
504It is a R/W mount which does not use the quorum protocol and is not
505cache coherent with the cluster, but which synchronizes from the cluster
506and allows modifying operations which will synchronize to the cluster.
507The most common case is to use a SOFT_MASTER PFS in a laptop allowing you
508to work on your laptop when you are on the road and not connected to
509your main servers, and for the laptop to synchronize when a connection is
510available.
511.Sh USE CASES FOR A SOFT_SLAVE
512A SOFT_SLAVE PFS type is a special type which must be specifically mounted
513by a machine.
514It is a RO mount which does not use the quorum protocol and is not
515cache coherent with the cluster.  It will receive synchronization from
516the cluster when network connectivity is available but will not stall if
517network connectivity is lost.
518.Sh FSYNC FLUSH MODES
519TODO.
520.Sh RESTORING FROM A SNAPSHOT BACKUP
521TODO.
522.Sh PERFORMANCE TUNING
523Because HAMMER2 implements compression, decompression, and dedup natively,
524it always double-buffers file data.  This means that the file data is
525cached via the device vnode (in compressed / dedupped-form) and the same
526data is also cached by the file vnode (in decompressed / non-dedupped form).
527.Pp
528While HAMMER2 will try to age the logical file buffers on its, some
529additional performance tuning may be necessary for optimal operation
530whether swapcache is used or not.  Our recommendation is to reduce the
531number of vnodes (and thus also the logical buffer cache behind the
532vnodes) that the system caches via the
533.Va kern.maxvnodes
534sysctl.
535.Pp
536Too-large a value will result in excessive double-caching and can cause
537unnecessary read disk I/O.
538We recommend a number between 25000 and 250000 vnodes, depending on your
539use case.
540Keep in mind that even though the vnode cache is smaller, this will make
541room for a great deal more device-level buffer caching which can encompasses
542far more data and meta-data than the vnode-level caching.
543.Sh ENVIRONMENT
544TODO.
545.Sh FILES
546.Bl -tag -width ".It Pa <fs>/abc/defghi/<name>" -compact
547.It Pa /etc/hammer2/
548.It Pa /etc/hammer2/rsa.pub
549.It Pa /etc/hammer2/rsa.prv
550.It Pa /etc/hammer2/autoconn
551.It Pa /etc/hammer2/remote/<IP>.pub
552.It Pa /etc/hammer2/remote/<IP>.none
553.El
554.Sh EXIT STATUS
555.Ex -std
556.Sh SEE ALSO
557.Xr mount_hammer2 8 ,
558.Xr mount_null 8 ,
559.Xr newfs_hammer2 8 ,
560.Xr swapcache 8 ,
561.Xr sysctl 8
562.Sh HISTORY
563The
564.Nm
565utility first appeared in
566.Dx 4.1 .
567.Sh AUTHORS
568.An Matthew Dillon Aq Mt dillon@backplane.com
569