xref: /dragonfly/sbin/hammer2/hammer2.8 (revision ed36d35d)
1.\" Copyright (c) 2015 The DragonFly Project.  All rights reserved.
2.\"
3.\" This code is derived from software contributed to The DragonFly Project
4.\" by Matthew Dillon <dillon@backplane.com>
5.\"
6.\" Redistribution and use in source and binary forms, with or without
7.\" modification, are permitted provided that the following conditions
8.\" are met:
9.\"
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. Redistributions in binary form must reproduce the above copyright
13.\"    notice, this list of conditions and the following disclaimer in
14.\"    the documentation and/or other materials provided with the
15.\"    distribution.
16.\" 3. Neither the name of The DragonFly Project nor the names of its
17.\"    contributors may be used to endorse or promote products derived
18.\"    from this software without specific, prior written permission.
19.\"
20.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
21.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
22.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
23.\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE
24.\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
25.\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING,
26.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
27.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
28.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
29.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
30.\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31.\" SUCH DAMAGE.
32.\"
33.Dd December 2, 2017
34.Dt HAMMER2 8
35.Os
36.Sh NAME
37.Nm hammer2
38.Nd hammer2 file system utility
39.Sh SYNOPSIS
40.Nm
41.Fl h
42.Nm
43.Op Fl s Ar path
44.Op Fl t Ar type
45.Op Fl u Ar uuid
46.Ar command
47.Op Ar argument ...
48.Sh DESCRIPTION
49The
50.Nm
51utility provides miscellaneous support functions for a
52HAMMER2 file system.
53.Pp
54The options are as follows:
55.Bl -tag -width indent
56.It Fl s Ar path
57Specify the path to a mounted HAMMER2 filesystem.
58At least one PFS on a HAMMER2 filesystem must be mounted for the system
59to act on all PFSs managed by it.
60Every HAMMER2 filesystem typically has a PFS called "LOCAL" for this purpose.
61.It Fl t Ar type
62Specify the type when creating, upgrading, or downgrading a PFS.
63Supported types are MASTER, SLAVE, SOFT_MASTER, SOFT_SLAVE, CACHE, and DUMMY.
64If not specified the pfs-create directive will default to MASTER if no
65uuid is specified, and SLAVE if a uuid is specified.
66.It Fl u Ar uuid
67Specify the cluster uuid when creating a PFS.  If not specified, a unique,
68random uuid will be generated.
69Note that every PFS also has a unique pfs_id which is always generated
70and cannot be overridden with an option.
71The { pfs_clid, pfs_fsid } tuple uniquely identifies a component of a cluster.
72.El
73.Pp
74.Nm
75directives are as shown below.
76Note that most directives require you to either be CD'd into a hammer2
77filesystem, specify a path to a mounted hammer2 filesystem via the
78.Fl s
79option, or specify a path after the directive.
80It depends on the directive.
81All hammer2 filesystem have a PFS called "LOCAL" which is typically mounted
82locally on the host in order to be able to issue commands for other PFSs
83on the filesystem.
84The mount also enables PFS configuration scanning for that filesystem.
85.Bl -tag -width indent
86.\" ==== cleanup ====
87.It Cm cleanup Op path
88Perform manual cleanup passes on paths or all mounted partitions.
89.\" ==== connect ====
90.It Cm connect Ar target
91Add a cluster link entry to the volume header.
92The volume header can support up to 255 link entries.
93This feature is not currently used.
94.\" ==== destroy ====
95.It Cm destroy Ar path
96Destroy the specified directory entry in a hammer2 filesystem.  This bypasses
97all normal checks and will unconditionally destroy the directory entry.
98The underlying inode is not checked and, if it does exist, its nlinks count
99is not decremented.
100This directive should only be used to destroy a corrupted directory entry
101which no longer has a working inode.
102.Pp
103Note that this command may desynchronize the system namecache for the
104specified entry.  If this happens, you may have to unmount and remount the
105filesystem.
106.\" ==== disconnect ====
107.It Cm disconnect Ar target
108Delete a cluster link entry from the volume header.
109This feature is not currently used.
110.\" ==== info ====
111.It Cm info Op devpath
112Access and print the status and super-root entries for all HAMMER2
113partitions found in /dev/serno or the specified device path(s).
114The partitions do not have to be mounted.
115Note that only mounted partitions will be under active management.
116This is accomplished by mounting at least one PFS within the partition.
117Typically at least the @LOCAL PFS is mounted.
118.\" ==== mountall ====
119.It Cm mountall Op devpath
120This directive mounts the @LOCAL PFS on all HAMMER2 partitions found
121in /dev/serno, or the specified device path(s).
122The partitions are mounted as /var/hammer2/LOCAL.<id>.
123Mounts are executed in the background and this command will wait a
124limited amount of time for the mounts to complete before returning.
125.\" ==== status ====
126.It Cm status Ar path...
127Dump a list of all cluster link entries configured in the volume header.
128.\" ==== hash ====
129.It Cm hash Ar filename...
130Compute and print the directory hash for any number of filenames.
131.\" ==== pfs-list ====
132.It Cm pfs-list Op path...
133List all local PFSs available on a mounted HAMMER2 filesystem, their type,
134and their current status.
135You must mount at least one PFS in order to be able to access the whole list.
136.\" ==== pfs-clid ====
137.It Cm pfs-clid Ar label
138Print the cluster id for a PFS specified by name.
139.\" ==== pfs-fsid ====
140.It Cm pfs-fsid Ar label
141Print the unique filesystem id for a PFS specified by name.
142.\" ==== pfs-create ====
143.It Cm pfs-create Ar label
144Create a local PFS on a mounted HAMMER2 filesystem.
145If no uuid is specified the pfs-type defaults to MASTER.
146If a uuid is specified via the
147.Fl u
148option the pfs-type defaults to SLAVE.
149Other types can be specified with the
150.Fl t
151option.
152.Pp
153If you wish to add a MASTER to an existing cluster, you must first add it as
154a SLAVE and then upgrade it to MASTER to properly synchronize it.
155.Pp
156The DUMMY pfs-type is used to tie network-accessible clusters into the local
157machine when no local storage is desired.
158This type should be used on minimal H2 partitions or entirely in ram for
159netboot-centric systems to provide a tie-in point for the mount command,
160or on more complex systems where you need to also access network-centric
161clusters.
162.Pp
163The CACHE or SLAVE pfs-type is typically used when the main store is on
164the network but local storage is desired to improve performance.
165SLAVE is also used when a backup is desired.
166.Pp
167Generally speaking, you can mount any PFS element of a cluster in order to
168access the cluster via the full cluster protocol.
169There are two exceptions.
170If you mount a SOFT_SLAVE or a SOFT_MASTER then soft quorum semantics are
171employed... the soft slave or soft master's current state will always be used
172and the quorum protocol will not be used.  The soft PFS will still be
173synchronized to masters in the background when available.
174Also, you can use
175.Sq mount -o local
176to mount ONLY a local HAMMER2 PFS and
177not run any network or quorum protocols for the mount.
178All such mounts except for a SOFT_MASTER mount will be read-only.
179Other than that, you will be mounting the whole cluster when you mount any
180PFS within the cluster.
181.Pp
182DUMMY - Create a PFS skeleton intended to be the mount point for a
183more complex cluster, probably one that is entirely network based.
184No data will be synchronized to this PFS so it is suitable for use
185in a network boot image or memory filesystem.
186This allows you to create placeholders for mount points on your local
187disk, SSD, or memory disk.
188.Pp
189CACHE - Create a PFS for caching portions of the cluster piecemeal.
190This is similar to a SLAVE but does not synchronize the entire contents of
191the cluster to the PFS.
192Elements found in the CACHE PFS which are validated against the cluster
193will be read, presumably a faster access than having to go to the cluster.
194Only local CACHEs will be updated.
195Network-accessible CACHE PFSs might be read but will not be written to.
196If you have a large hard-drive-based cluster you can set up localized
197SSD CACHE PFSs to improve performance.
198.Pp
199SLAVE - Create a PFS which maintains synchronization with and provides a
200read-only copy of the cluster.
201HAMMER2 will prioritize local SLAVEs for data retrieval after validating
202their transaction id against the cluster.
203The difference between a CACHE and a SLAVE is that the SLAVE is synchronized
204to a full copy of the cluster and thus can serve as a backup or be staged
205for use as a MASTER later on.
206.Pp
207SOFT_SLAVE - Create a PFS which maintains synchronization with and provides
208a read-only copy of the cluster.
209This is one of the special mount cases.  A SOFT_SLAVE will synchronize with
210the cluster when the cluster is available, but can still be accessed when
211the cluster is not available.
212.Pp
213MASTER - Create a PFS which will hold a master copy of the cluster.
214If you create several MASTER PFSs with the same cluster id you are
215effectively creating a multi-master cluster and causing a quorum and
216cache coherency protocol to be used to validate operations.
217The total number of masters is stored in each PFSs making up the cluster.
218Filesystem operations will stall for normal mounts if a quorum cannot be
219obtained to validate the operation.
220MASTER nodes which go offline and return later will synchronize in the
221background.
222Note that when adding a MASTER to an existing cluster you must add the
223new PFS as a SLAVE and then upgrade it to a MASTER.
224.Pp
225SOFT_MASTER - Create a PFS which maintains synchronization with and provides
226a read-write copy of the cluster.
227This is one of the special mount cases.  A SOFT_MASTER will synchronize with
228the cluster when the cluster is available, but can still be read AND written
229to even when the cluster is not available.
230Modifications made to a SOFT_MASTER will be automatically flushed to the
231cluster when it becomes accessible again, and vise-versa.
232Manual intervention may be required if a conflict occurs during
233synchronization.
234.\" ==== pfs-delete ====
235.It Cm pfs-delete Ar label
236Delete a local PFS on a mounted HAMMER2 filesystem.
237Deleting a PFS of type MASTER requires first downgrading it to a SLAVE (XXX).
238.\" ==== snapshot ====
239.It Cm snapshot Ar path Op label
240Create a snapshot of a directory.
241This can only be used on a local PFS, and is only really useful if the PFS
242contains a complete copy of what you desire to snapshot so that typically
243means a local MASTER, SOFT_MASTER, SLAVE, or SOFT_SLAVE must be present.
244Snapshots are created simply by flushing a PFS mount to disk and then copying
245the directory inode to the PFS.
246The topology is snapshotted without having to be copied or scanned.
247Snapshots are effectively separate from the cluster they came from
248and can be used as a starting point for a new cluster.
249So unless you build a new cluster from the snapshot, it will stay local
250to the machine it was made on.
251.\" ==== service ====
252.It Cm service
253Start the
254.Nm
255service daemon.
256This daemon is also automatically started when you run
257.Xr mount_hammer2 8 .
258The hammer2 service daemon handles incoming TCP connections and maintains
259outgoing TCP connections.  It will interconnect available services on the
260machine (e.g. hammer2 mounts and xdisks) to the network.
261.\" ==== stat ====
262.It Cm stat Op path...
263Print the inode statistics, compression, and other meta-data associated
264with a list of paths.
265.\" ==== leaf ====
266.It Cm leaf
267XXX
268.\" ==== shell ====
269.It Cm shell
270Start a debug shell to the local hammer2 service daemon via the DMSG protocol.
271.\" ==== debugspan ====
272.It Cm debugspan
273(do not use)
274.\" ==== rsainit ====
275.It Cm rsainit
276Create the
277.Pa /etc/hammer2
278directory and initialize a public/private keypair in that directory for
279use by the network cluster protocols.
280.\" ==== show ====
281.It Cm show Ar devpath
282Dump the radix tree for the HAMMER2 filesystem by scanning a
283block device directly.  No mount is required.
284.\" ==== freemap ====
285Dump the freemap tree for the HAMMER2 filesystem by scanning a
286block device directly.  No mount is required.
287.It Cm freemap Ar devpath
288.\" ==== setcomp ====
289.It Cm setcomp Ar mode[:level] Op path...
290Set the compression mode as specified for any newly created elements at or
291under the path if not overridden by deeper elements.
292Available modes are none, autozero, lz4, or zlib.
293When zlib is used the compression level can be set.
294The default will be 6 which is the best trade-off between performance and
295time.
296.Pp
297newfs_hammer2 will set the default compression to lz4 which prioritizes
298speed over performance.
299Also note that HAMMER2 contains a heuristic and will not attempt to
300compress every block if it detects a sufficient amount of uncompressable
301data.
302.Pp
303Hammer2 compression is only effective when it can reduce the size of dataset
304(typically a 64KB block) by one or more powers of 2.  A 64K block which
305only compresses to 40K will not yield any storage improvement.
306.Pp
307Generally speaking you do not want to set the compression mode to
308.Sq none ,
309as this will cause blocks of all-zeros to be written as all-zero blocks,
310instead of holes.  The
311.Sq autozero
312compression mode detects blocks of all-zeros
313and writes them as holes.  However, HAMMER2 will rewrite data in-place if
314the compression mode is set to
315.Sq none
316and the check code is set to
317.Sq  disabled .
318Formal snapshots will still snapshot such files.  However,
319de-duplication will no longer function on the data blocks.
320.\" ==== setcheck ====
321.It Cm setcheck Ar check Op path...
322Set the check code as specified for any newly created elements at or under
323the path if not overridden by deeper elements.
324Available codes are default, disabled, crc32, xxhash64, or sha192.
325.\" ==== clrcheck ====
326.It Cm clrcheck Op path...
327Clear the check code override for the specified paths.
328Overrides may still be present in deeper elements.
329.\" ==== setcrc32 ====
330.It Cm setcrc32 Op path...
331Set the check code to the ISCSI 32-bit CRC for any newly created elements
332at or under the path if not overridden by deeper elements.
333.\" ==== setxxhash64 ====
334.It Cm setxxhash64 Op path...
335Set the check code to XXHASH64, a fast 64-bit hash
336.\" ==== setsha192 ====
337.It Cm setsha192 Op path...
338Set the check code to SHA192 for any newly created elements at or under
339the path if not overridden by deeper elements.
340.\" ==== bulkfree ====
341.It Cm bulkfree Op path...
342Run a bulkfree pass on a HAMMER2 mount.
343You can specify any PFS for the mount, the bulkfree pass is run on the
344entire partition.
345Note that it takes two passes to actually free space.
346.El
347.Sh SYSCTLS
348.Bl -tag -width indent
349.It Va vfs.hammer2.dedup_enable (default on)
350Enables live de-duplication.  Any recently read data that is on-media
351(already synchronized to media) is tested against pending writes for
352compatibility.  If a match is found, the write will reference the
353existing on-media data instead of writing new data.
354.It Va vfs.hammer2.always_compress (default off)
355This disables the H2 compression heuristic and forces H2 to always
356try to compress data blocks, even if they look uncompressable.
357Enabling this option reduces performance but has higher de-duplication
358repeatability.
359.It Va vfs.hammer2.cluster_data_read (default 4)
360.It Va vfs.hammer2.cluster_meta_read (default 1)
361Set the amount of read-ahead clustering to perform on data and meta-data
362blocks.
363.It Va vfs.hammer2.cluster_write (default 4)
364Set the amount of write-behind clustering to perform in buffers.  Each
365buffer represents 64KB.  The default is 4 and higher values typically do
366not improve performance.  A value of 0 disables clustered writes.
367This variable applies to the underlying media device, not to logical
368file writes, so it should not interfere with temporary file optimization.
369Generally speaking you want this enabled to generate smoothly pipelined
370writes to the media.
371.It Va vfs.hammer2.bulkfree_tps (default 5000)
372Set bulkfree's maximum scan rate.  This is primarily intended to limit
373I/O utilization on SSDs and cpu utilization when the meta-data is mostly
374cached in memory.
375.El
376.Sh SETTING UP /etc/hammer2
377The
378.Sq rsainit
379directive will create the
380.Pa /etc/hammer2
381directory with appropriate permissions and also generate a public key
382pair in this directory for the machine.  These files will be
383.Pa rsa.pub
384and
385.Pa rsa.prv
386and needless to say, the private key shouldn't leave the host.
387.Pp
388The service daemon will also scan the
389.Pa /etc/hammer2/autoconn
390file which contains a list of hosts which it will automatically maintain
391connections to to form your cluster.
392The service daemon will automatically reconnect on any failure and will
393also monitor the file for changes.
394.Pp
395When the service daemon receives a connection it expects to find a
396public key for that connection in a file in
397.Pa /etc/hammer2/remote/
398called
399.Pa <IPADDR>.pub .
400You normally copy the
401.Pa rsa.pub
402key from the host in question to this file.
403The IP address must match exactly or the connection will not be allowed.
404.Pp
405If you want to use an unencrypted connection you can create empty,
406dummy files in the remote directory in the form
407.Pa <IPADDR>.none .
408We do not recommend using unencrypted connections.
409.Sh CLUSTER SERVICES
410Currently there are two services which use the cluster network infrastructure,
411HAMMER2 mounts and XDISK.
412Any HAMMER2 mount will make all PFSs for that filesystem available to the
413cluster.
414And if the XDISK kernel module is loaded, the hammer2 service daemon will make
415your machine's block devices available to the cluster (you must load the
416xdisk.ko kernel module before starting the hammer2 service).
417They will show up as
418.Pa /dev/xa*
419and
420.Pa /dev/serno/*
421devices on the remote machines making up the cluster.
422Remote block devices are just what they appear to be... direct access to a
423block device on a remote machine.  If the link goes down remote accesses
424will stall until it comes back up again, then automatically requeue any
425pending I/O and resume as if nothing happened.
426However, if the server hosting the physical disks crashes or is rebooted,
427any remote opens to its devices will see a permanent I/O failure requiring a
428close and open sequence to re-establish.
429The latter is necessary because the server's drives might not have committed
430the data before the crash, but had already acknowledged the transfer.
431.Pp
432Data commits work exactly the same as they do for real block devices.
433The originater must issue a BUF_CMD_FLUSH.
434.Sh ADDING A NEW MASTER TO A CLUSTER
435When you
436.Xr newfs_hammer2 8
437a HAMMER2 filesystem or use the
438.Sq pfs-create
439directive on one already mounted
440to create a new PFS, with no special options, you wind up with a PFS
441typed as a MASTER and a unique cluster uuid, but because there is only one
442PFS for that cluster (for each PFS you create via pfs-create), it will
443act just like a normal filesystem would act and does not require any special
444protocols to operate.
445.Pp
446If you use the
447.Sq pfs-create
448directive along with the
449.Fl u
450option to specify a cluster uuid that already exists in the cluster,
451you are adding a PFS to an existing cluster and this can trigger a whole
452series of events in the background.
453When you specify the
454.Fl u
455option in a
456.Sq pfs-create ,
457.Nm
458will by default create a SLAVE PFS.
459In fact, this is what must be created first even if you want to add a new
460MASTER to your cluster.
461.Pp
462The most common action a system admin will want to take is to upgrade or
463downgrade a PFS.
464A new MASTER can be added to the cluster by upgrading an existing SLAVE
465to MASTER.
466A MASTER can be removed from the cluster by downgrading it to a SLAVE.
467Upgrades and downgrades will put nodes in the cluster in a transition state
468until the operation is complete.
469For downgrades the transition state is fleeting unless one or more other
470masters has not acknowledged the change.
471For upgrades a background synchronization process must complete before the
472transition can be said to be complete, and the node remains (really) a SLAVE
473until that transition is complete.
474.Sh USE CASES FOR A SOFT_MASTER
475The SOFT_MASTER PFS type is a special type which must be specifically
476mounted by a machine.
477It is a R/W mount which does not use the quorum protocol and is not
478cache coherent with the cluster, but which synchronizes from the cluster
479and allows modifying operations which will synchronize to the cluster.
480The most common case is to use a SOFT_MASTER PFS in a laptop allowing you
481to work on your laptop when you are on the road and not connected to
482your main servers, and for the laptop to synchronize when a connection is
483available.
484.Sh USE CASES FOR A SOFT_SLAVE
485A SOFT_SLAVE PFS type is a special type which must be specifically mounted
486by a machine.
487It is a RO mount which does not use the quorum protocol and is not
488cache coherent with the cluster.  It will receive synchronization from
489the cluster when network connectivity is available but will not stall if
490network connectivity is lost.
491.Sh FSYNC FLUSH MODES
492TODO.
493.Sh RESTORING FROM A SNAPSHOT BACKUP
494TODO.
495.Sh PERFORMANCE TUNING
496Because HAMMER2 implements compression, decompression, and dedup natively,
497it always double-buffers file data.  This means that the file data is
498cached via the device vnode (in compressed / dedupped-form) and the same
499data is also cached by the file vnode (in decompressed / non-dedupped form).
500.Pp
501While HAMMER2 will try to age the logical file buffers on its, some
502additional performance tuning may be necessary for optimal operation
503whether swapcache is used or not.  Our recommendation is to reduce the
504number of vnodes (and thus also the logical buffer cache behind the
505vnodes) that the system caches via the
506.Va kern.maxvnodes
507sysctl.
508.Pp
509Too-large a value will result in excessive double-caching and can cause
510unnecessary read disk I/O.
511We recommend a number between 25000 and 250000 vnodes, depending on your
512use case.
513Keep in mind that even though the vnode cache is smaller, this will make
514room for a great deal more device-level buffer caching which can encompasses
515far more data and meta-data than the vnode-level caching.
516.Sh ENVIRONMENT
517TODO.
518.Sh FILES
519.Bl -tag -width ".It Pa <fs>/abc/defghi/<name>" -compact
520.It Pa /etc/hammer2/
521.It Pa /etc/hammer2/rsa.pub
522.It Pa /etc/hammer2/rsa.prv
523.It Pa /etc/hammer2/autoconn
524.It Pa /etc/hammer2/remote/<IP>.pub
525.It Pa /etc/hammer2/remote/<IP>.none
526.El
527.Sh EXIT STATUS
528.Ex -std
529.Sh SEE ALSO
530.Xr mount_hammer2 8 ,
531.Xr mount_null 8 ,
532.Xr newfs_hammer2 8 ,
533.Xr swapcache 8 ,
534.Xr sysctl 8
535.Sh HISTORY
536The
537.Nm
538utility first appeared in
539.Dx 4.1 .
540.Sh AUTHORS
541.An Matthew Dillon Aq Mt dillon@backplane.com
542