xref: /dragonfly/sbin/vinum/vinum.8 (revision 984263bc)
1.\"  Hey, Emacs, edit this file in -*- nroff-fill -*- mode
2.\"-
3.\" Copyright (c) 1997, 1998
4.\"	Nan Yang Computer Services Limited.  All rights reserved.
5.\"
6.\"  This software is distributed under the so-called ``Berkeley
7.\"  License'':
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\" 3. All advertising materials mentioning features or use of this software
18.\"    must display the following acknowledgement:
19.\"	This product includes software developed by Nan Yang Computer
20.\"      Services Limited.
21.\" 4. Neither the name of the Company nor the names of its contributors
22.\"    may be used to endorse or promote products derived from this software
23.\"    without specific prior written permission.
24.\"
25.\" This software is provided ``as is'', and any express or implied
26.\" warranties, including, but not limited to, the implied warranties of
27.\" merchantability and fitness for a particular purpose are disclaimed.
28.\" In no event shall the company or contributors be liable for any
29.\" direct, indirect, incidental, special, exemplary, or consequential
30.\" damages (including, but not limited to, procurement of substitute
31.\" goods or services; loss of use, data, or profits; or business
32.\" interruption) however caused and on any theory of liability, whether
33.\" in contract, strict liability, or tort (including negligence or
34.\" otherwise) arising in any way out of the use of this software, even if
35.\" advised of the possibility of such damage.
36.\"
37.\" $Id: vinum.8,v 1.48 2001/01/15 22:15:05 grog Exp $
38.\" $FreeBSD: src/sbin/vinum/vinum.8,v 1.33.2.10 2002/12/29 16:35:38 schweikh Exp $
39.\"
40.Dd December 20, 2000
41.Dt VINUM 8
42.Os
43.Sh NAME
44.Nm vinum
45.Nd Logical Volume Manager control program
46.Sh SYNOPSIS
47.Nm
48.Op Ar command
49.Op Fl options
50.Sh COMMANDS
51.Bl -tag -width indent
52.It Ic attach Ar plex volume Op Cm rename
53.It Xo
54.Ic attach Ar subdisk plex
55.Op Ar offset
56.Op Cm rename
57.Xc
58Attach a plex to a volume, or a subdisk to a plex.
59.It Xo
60.Ic checkparity Ar plex
61.Op Fl f
62.Op Fl v
63.Xc
64Check the parity blocks of a RAID-4 or RAID-5 plex.
65.It Xo
66.Ic concat
67.Op Fl f
68.Op Fl n Ar name
69.Op Fl v
70.Ar drives
71.Xc
72Create a concatenated volume from the specified drives.
73.It Xo
74.Ic create
75.Op Fl f
76.Ar description-file
77.Xc
78Create a volume as described in
79.Ar description-file .
80.It Ic debug
81Cause the volume manager to enter the kernel debugger.
82.It Ic debug Ar flags
83Set debugging flags.
84.It Xo
85.Ic detach
86.Op Fl f
87.Op Ar plex | subdisk
88.Xc
89Detach a plex or subdisk from the volume or plex to which it is attached.
90.It Ic dumpconfig Op Ar drive ...
91List the configuration information stored on the specified drives, or all drives
92in the system if no drive names are specified.
93.It Xo
94.Ic info
95.Op Fl v
96.Op Fl V
97.Xc
98List information about volume manager state.
99.It Xo
100.Ic init
101.Op Fl S Ar size
102.Op Fl w
103.Ar plex | subdisk
104.Xc
105.\" XXX
106Initialize the contents of a subdisk or all the subdisks of a plex to all zeros.
107.It Ic label Ar volume
108Create a volume label.
109.It Xo
110.Ic l | list
111.Op Fl r
112.Op Fl s
113.Op Fl v
114.Op Fl V
115.Op Ar volume | plex | subdisk
116.Xc
117List information about specified objects.
118.It Xo
119.Ic ld
120.Op Fl r
121.Op Fl s
122.Op Fl v
123.Op Fl V
124.Op Ar volume
125.Xc
126List information about drives.
127.It Xo
128.Ic ls
129.Op Fl r
130.Op Fl s
131.Op Fl v
132.Op Fl V
133.Op Ar subdisk
134.Xc
135List information about subdisks.
136.It Xo
137.Ic lp
138.Op Fl r
139.Op Fl s
140.Op Fl v
141.Op Fl V
142.Op Ar plex
143.Xc
144List information about plexes.
145.It Xo
146.Ic lv
147.Op Fl r
148.Op Fl s
149.Op Fl v
150.Op Fl V
151.Op Ar volume
152.Xc
153List information about volumes.
154.It Ic makedev
155Remake the device nodes in
156.Pa /dev/vinum .
157.It Xo
158.Ic mirror
159.Op Fl f
160.Op Fl n Ar name
161.Op Fl s
162.Op Fl v
163.Ar drives
164.Xc
165Create a mirrored volume from the specified drives.
166.It Xo
167.Ic move | mv
168.Fl f
169.Ar drive object ...
170.Xc
171Move the object(s) to the specified drive.
172.It Ic printconfig Op Ar file
173Write a copy of the current configuration to
174.Ar file .
175.It Ic quit
176Exit the
177.Nm
178program when running in interactive mode.  Normally this would be done by
179entering the
180.Dv EOF
181character.
182.It Ic read Ar disk ...
183Read the
184.Nm
185configuration from the specified disks.
186.It Xo
187.Ic rename Op Fl r
188.Op Ar drive | subdisk | plex | volume
189.Ar newname
190.Xc
191Change the name of the specified object.
192.\" XXX
193.\".It Ic replace Ar drive newdrive
194.\"Move all the subdisks from the specified drive onto the new drive.
195.It Xo
196.Ic rebuildparity Ar plex Op Fl f
197.Op Fl v
198.Op Fl V
199.Xc
200Rebuild the parity blocks of a RAID-4 or RAID-5 plex.
201.It Ic resetconfig
202Reset the complete
203.Nm
204configuration.
205.It Xo
206.Ic resetstats
207.Op Fl r
208.Op Ar volume | plex | subdisk
209.Xc
210Reset statistics counters for the specified objects, or for all objects if none
211are specified.
212.It Xo
213.Ic rm
214.Op Fl f
215.Op Fl r
216.Ar volume | plex | subdisk
217.Xc
218Remove an object.
219.It Ic saveconfig
220Save
221.Nm
222configuration to disk after configuration failures.
223.\" XXX
224.\".It Xo
225.\".Ic set
226.\".Op Fl f
227.\".Ar state
228.\".Ar volume | plex | subdisk | disk
229.\".Xc
230.\"Set the state of the object to
231.\".Ar state .
232.It Ic setdaemon Op Ar value
233Set daemon configuration.
234.It Xo
235.Ic setstate
236.Ar state
237.Op Ar volume | plex | subdisk | drive
238.Xc
239Set state without influencing other objects, for diagnostic purposes only.
240.It Ic start
241Read configuration from all vinum drives.
242.It Xo
243.Ic start
244.Op Fl i Ar interval
245.Op Fl S Ar size
246.Op Fl w
247.Ar volume | plex | subdisk
248.Xc
249Allow the system to access the objects.
250.It Xo
251.Ic stop
252.Op Fl f
253.Op Ar volume | plex | subdisk
254.Xc
255Terminate access to the objects, or stop
256.Nm
257if no parameters are specified.
258.It Xo
259.Ic stripe
260.Op Fl f
261.Op Fl n Ar name
262.Op Fl v
263.Ar drives
264.Xc
265Create a striped volume from the specified drives.
266.El
267.Sh DESCRIPTION
268.Nm
269is a utility program to communicate with the
270.Xr vinum 4
271logical volume
272manager.
273.Nm
274is designed either for interactive use, when started without command line
275arguments, or to execute a single command if the command is supplied on the
276command line.  In interactive mode,
277.Nm
278maintains a command line history.
279.Sh OPTIONS
280.Nm
281commands may optionally be followed by an option.  Any of the following options
282may be specified with any command, but in some cases the options are ignored.
283For example, the
284.Ic stop
285command ignores the
286.Fl v
287and
288.Fl V
289options.
290.Bl -tag -width indent
291.It Fl f
292The
293.Fl f
294.Pq Dq force
295option overrides safety checks.  Use with extreme care.  This option is for
296emergency use only.  For example, the command
297.Pp
298.Dl rm -f myvolume
299.Pp
300removes
301.Ar myvolume
302even if it is open.  Any subsequent access to the volume will almost certainly
303cause a panic.
304.It Fl i Ar millisecs
305When performing the
306.Ic init
307and
308.Ic start
309commands, wait
310.Ar millisecs
311milliseconds between copying each block.  This lowers the load on the system.
312.It Fl n Ar name
313Use the
314.Fl n
315option to specify a volume name to the simplified configuration commands
316.Ic concat , mirror
317and
318.Ic stripe .
319.It Fl r
320The
321.Fl r
322.Pq Dq recursive
323option is used by the list commands to display information not
324only about the specified objects, but also about subordinate objects.  For
325example, in conjunction with the
326.Ic lv
327command, the
328.Fl r
329option will also show information about the plexes and subdisks belonging to the
330volume.
331.It Fl s
332The
333.Fl s
334.Pq Dq statistics
335option is used by the list commands to display statistical information.  The
336.Ic mirror
337command also uses this option to specify that it should create striped plexes.
338.It Fl S Ar size
339The
340.Fl S
341option specifies the transfer size for the
342.Ic init
343and
344.Ic start
345commands.
346.It Fl v
347The
348.Fl v
349.Pq Dq verbose
350option can be used to request more detailed information.
351.It Fl V
352The
353.Fl V
354.Pq Dq Very verbose
355option can be used to request more detailed information than the
356.Fl v
357option provides.
358.It Fl w
359The
360.Fl w
361.Pq Dq wait
362option tells
363.Nm
364to wait for completion of commands which normally run in the background, such as
365.Ic init .
366.El
367.Sh COMMANDS IN DETAIL
368.Nm
369commands perform the following functions:
370.Pp
371.Bl -tag -width indent -compact
372.It Ic attach Ar plex volume Op Cm rename
373.It Xo
374.Ic attach Ar subdisk plex
375.Op Ar offset
376.Op Cm rename
377.Xc
378.Nm Ic attach
379inserts the specified plex or subdisk in a volume or plex.  In the case of a
380subdisk, an offset in the plex may be specified.  If it is not, the subdisk will
381be attached at the first possible location.  After attaching a plex to a
382non-empty volume,
383.Nm
384reintegrates the plex.
385.Pp
386If the keyword
387.Cm rename
388is specified,
389.Nm
390renames the object (and in the case of a plex, any subordinate subdisks) to fit
391in with the default
392.Nm
393naming convention.  To rename the object to any other name, use the
394.Ic rename
395command.
396.Pp
397A number of considerations apply to attaching subdisks:
398.Bl -bullet
399.It
400Subdisks can normally only be attached to concatenated plexes.
401.It
402If a striped or RAID-5 plex is missing a subdisk (for example after drive
403failure), it should be replaced by a subdisk of the same size only.
404.It
405In order to add further subdisks to a striped or RAID-5 plex, use the
406.Fl f
407(force) option.  This will corrupt the data in the plex.
408.\"No other attachment of
409.\"subdisks is currently allowed for striped and RAID-5 plexes.
410.It
411For concatenated plexes, the
412.Ar offset
413parameter specifies the offset in blocks from the beginning of the plex.  For
414striped and RAID-5 plexes, it specifies the offset of the first block of the
415subdisk: in other words, the offset is the numerical position of the subdisk
416multiplied by the stripe size.  For example, in a plex with stripe size 271k,
417the first subdisk will have offset 0, the second offset 271k, the third 542k,
418etc.  This calculation ignores parity blocks in RAID-5 plexes.
419.El
420.Pp
421.It Xo
422.Ic checkparity
423.Ar plex
424.Op Fl f
425.Op Fl v
426.Xc
427Check the parity blocks on the specified RAID-4 or RAID-5 plex.  This operation
428maintains a pointer in the plex, so it can be stopped and later restarted from
429the same position if desired.  In addition, this pointer is used by the
430.Ic rebuildparity
431command, so rebuilding the parity blocks need only start at the location where
432the first parity problem has been detected.
433.Pp
434If the
435.Fl f
436flag is specified,
437.Ic checkparity
438starts checking at the beginning of the plex.  If the
439.Fl v
440flag is specified,
441.Ic checkparity
442prints a running progress report.
443.Pp
444.It Xo
445.Ic concat
446.Op Fl f
447.Op Fl n Ar name
448.Op Fl v
449.Ar drives
450.Xc
451The
452.Ic concat
453command provides a simplified alternative to the
454.Ic create
455command for creating volumes with a single concatenated plex.  The largest
456contiguous space available on each drive is used to create the subdisks for the
457plexes.
458.Pp
459Normally, the
460.Ic concat
461command creates an arbitrary name for the volume and its components.  The name
462is composed of the text
463.Dq Li vinum
464and a small integer, for example
465.Dq Li vinum3 .
466You can override this with the
467.Fl n Ar name
468option, which assigns the name specified to the volume.  The plexes and subdisks
469are named after the volume in the default manner.
470.Pp
471There is no choice of name for the drives.  If the drives have already been
472initialized as
473.Nm
474drives, the name remains.  Otherwise the drives are given names starting with
475the text
476.Dq Li vinumdrive
477and a small integer, for example
478.Dq Li vinumdrive7 .
479As with the
480.Ic create
481command, the
482.Fl f
483option can be used to specify that a previous name should be overwritten.  The
484.Fl v
485is used to specify verbose output.
486.Pp
487See the section
488.Sx SIMPLIFIED CONFIGURATION
489below for some examples of this
490command.
491.Pp
492.It Xo
493.Ic create
494.Op Fl f
495.Ar description-file
496.Xc
497.Nm Ic create
498is used to create any object.  In view of the relatively complicated
499relationship and the potential dangers involved in creating a
500.Nm
501object, there is no interactive interface to this function.  If you do not
502specify a file name,
503.Nm
504starts an editor on a temporary file.  If the environment variable
505.Ev EDITOR
506is set,
507.Nm
508starts this editor.  If not, it defaults to
509.Nm vi .
510See the section
511.Sx CONFIGURATION FILE
512below for more information on the format of
513this file.
514.Pp
515Note that the
516.Nm Ic create
517function is additive: if you run it multiple times, you will create multiple
518copies of all unnamed objects.
519.Pp
520Normally the
521.Ic create
522command will not change the names of existing
523.Nm
524drives, in order to avoid accidentally erasing them.  The correct way to dispose
525of no longer wanted
526.Nm
527drives is to reset the configuration with the
528.Ic resetconfig
529command.  In some cases, however, it may be necessary to create new data on
530.Nm
531drives which can no longer be started.  In this case, use the
532.Ic create Fl f
533command.
534.Pp
535.It Ic debug
536.Nm Ic debug ,
537without any arguments, is used to enter the remote kernel debugger.  It is only
538activated if
539.Nm
540is built with the
541.Dv VINUMDEBUG
542option.  This option will stop the execution of the operating system until the
543kernel debugger is exited.  If remote debugging is set and there is no remote
544connection for a kernel debugger, it will be necessary to reset the system and
545reboot in order to leave the debugger.
546.Pp
547.It Ic debug Ar flags
548Set a bit mask of internal debugging flags.  These will change without warning
549as the product matures; to be certain, read the header file
550.Aq Pa sys/dev/vinumvar.h .
551The bit mask is composed of the following values:
552.Bl -tag -width indent
553.It Dv DEBUG_ADDRESSES Pq No 1
554Show buffer information during requests
555.\".It Dv DEBUG_NUMOUTPUT Pq No 2
556.\"Show the value of
557.\".Va vp->v_numoutput .
558.It Dv DEBUG_RESID Pq No 4
559Go into debugger in
560.Fn complete_rqe .
561.It Dv DEBUG_LASTREQS Pq No 8
562Keep a circular buffer of last requests.
563.It Dv DEBUG_REVIVECONFLICT Pq No 16
564Print info about revive conflicts.
565.It Dv DEBUG_EOFINFO Pq No 32
566Print information about internal state when returning an
567.Dv EOF
568on a striped plex.
569.It Dv DEBUG_MEMFREE Pq No 64
570Maintain a circular list of the last memory areas freed by the memory allocator.
571.It Dv DEBUG_REMOTEGDB Pq No 256
572Go into remote
573.Nm gdb
574when the
575.Ic debug
576command is issued.
577.It Dv DEBUG_WARNINGS Pq No 512
578Print some warnings about minor problems in the implementation.
579.El
580.Pp
581.It Ic detach Oo Fl f Oc Ar plex
582.It Ic detach Oo Fl f Oc Ar subdisk
583.Nm Ic detach
584removes the specified plex or subdisk from the volume or plex to which it is
585attached.  If removing the object would impair the data integrity of the volume,
586the operation will fail unless the
587.Fl f
588option is specified.  If the object is named after the object above it (for
589example, subdisk
590.Li vol1.p7.s0
591attached to plex
592.Li vol1.p7 ) ,
593the name will be changed
594by prepending the text
595.Dq Li ex-
596(for example,
597.Li ex-vol1.p7.s0 ) .
598If necessary, the name will be truncated in the
599process.
600.Pp
601.Ic detach
602does not reduce the number of subdisks in a striped or RAID-5 plex.  Instead,
603the subdisk is marked absent, and can later be replaced with the
604.Ic attach
605command.
606.Pp
607.It Ic dumpconfig Op Ar drive ...
608.Pp
609.Nm Ic dumpconfig
610shows the configuration information stored on the specified drives.  If no drive
611names are specified,
612.Ic dumpconfig
613searches all drives on the system for Vinum partitions and dumps the
614information.  If configuration updates are disabled, it is possible that this
615information is not the same as the information returned by the
616.Ic list
617command.  This command is used primarily for maintenance and debugging.
618.Pp
619.It Ic info
620.Nm Ic info
621displays information about
622.Nm
623memory usage.  This is intended primarily for debugging.  With the
624.Fl v
625option, it will give detailed information about the memory areas in use.
626.Pp
627With the
628.Fl V
629option,
630.Ic info
631displays information about the last up to 64 I/O requests handled by the
632.Nm
633driver.  This information is only collected if debug flag 8 is set.  The format
634looks like:
635.Bd -literal
636vinum -> info -V
637Flags: 0x200    1 opens
638Total of 38 blocks malloced, total memory: 16460
639Maximum allocs:       56, malloc table at 0xf0f72dbc
640
641Time             Event       Buf        Dev     Offset          Bytes   SD      SDoff   Doffset Goffset
642
64314:40:00.637758 1VS Write 0xf2361f40    91.3  0x10            16384
64414:40:00.639280 2LR Write 0xf2361f40    91.3  0x10            16384
64514:40:00.639294 3RQ Read  0xf2361f40    4.39   0x104109        8192    19      0       0       0
64614:40:00.639455 3RQ Read  0xf2361f40    4.23   0xd2109         8192    17      0       0       0
64714:40:00.639529 3RQ Read  0xf2361f40    4.15   0x6e109         8192    16      0       0       0
64814:40:00.652978 4DN Read  0xf2361f40    4.39   0x104109        8192    19      0       0       0
64914:40:00.667040 4DN Read  0xf2361f40    4.15   0x6e109         8192    16      0       0       0
65014:40:00.668556 4DN Read  0xf2361f40    4.23   0xd2109         8192    17      0       0       0
65114:40:00.669777 6RP Write 0xf2361f40    4.39   0x104109        8192    19      0       0       0
65214:40:00.685547 4DN Write 0xf2361f40    4.39   0x104109        8192    19      0       0       0
65311:11:14.975184 Lock      0xc2374210    2      0x1f8001
65411:11:15.018400 7VS Write 0xc2374210           0x7c0           32768   10
65511:11:15.018456 8LR Write 0xc2374210    13.39  0xcc0c9         32768
65611:11:15.046229 Unlock    0xc2374210    2      0x1f8001
657.Ed
658.Pp
659The
660.Ar Buf
661field always contains the address of the user buffer header.  This can be used
662to identify the requests associated with a user request, though this is not 100%
663reliable: theoretically two requests in sequence could use the same buffer
664header, though this is not common.  The beginning of a request can be identified
665by the event
666.Ar 1VS
667or
668.Ar 7VS .
669The first example above shows the requests involved in a user request.  The
670second is a subdisk I/O request with locking.
671.Pp
672The
673.Ar Event
674field contains information related to the sequence of events in the request
675chain.  The digit
676.Ar 1
677to
678.Ar 6
679indicates the approximate sequence of events, and the two-letter abbreviation is
680a mnemonic for the location:
681.Bl -tag -width Lockwait
682.It 1VS
683(vinumstrategy) shows information about the user request on entry to
684.Fn vinumstrategy .
685The device number is the
686.Nm
687device, and offset and length are the user parameters.  This is always the
688beginning of a request sequence.
689.It 2LR
690(launch_requests) shows the user request just prior to launching the low-level
691.Nm
692requests in the function
693.Fn launch_requests .
694The parameters should be the same as in the
695.Ar 1VS
696information.
697.El
698.Pp
699In the following requests,
700.Ar Dev
701is the device number of the associated disk partition,
702.Ar Offset
703is the offset from the beginning of the partition,
704.Ar SD
705is the subdisk index in
706.Va vinum_conf ,
707.Ar SDoff
708is the offset from the beginning of the subdisk,
709.Ar Doffset
710is the offset of the associated data request, and
711.Ar Goffset
712is the offset of the associated group request, where applicable.
713.Bl -tag -width Lockwait
714.It 3RQ
715(request) shows one of possibly several low-level
716.Nm
717requests which are launched to satisfy the high-level request.  This information
718is also logged in
719.Fn launch_requests .
720.It 4DN
721(done) is called from
722.Fn complete_rqe ,
723showing the completion of a request.  This completion should match a request
724launched either at stage
725.Ar 4DN
726from
727.Fn launch_requests ,
728or from
729.Fn complete_raid5_write
730at stage
731.Ar 5RD
732or
733.Ar 6RP .
734.It 5RD
735(RAID-5 data) is called from
736.Fn complete_raid5_write
737and represents the data written to a RAID-5 data stripe after calculating
738parity.
739.It 6RP
740(RAID-5 parity) is called from
741.Fn complete_raid5_write
742and represents the data written to a RAID-5 parity stripe after calculating
743parity.
744.It 7VS
745shows a subdisk I/O request.  These requests are usually internal to
746.Nm
747for operations like initialization or rebuilding plexes.
748.It 8LR
749shows the low-level operation generated for a subdisk I/O request.
750.It Lockwait
751specifies that the process is waiting for a range lock.  The parameters are the
752buffer header associated with the request, the plex number and the block number.
753For internal reasons the block number is one higher than the address of the
754beginning of the stripe.
755.It Lock
756specifies that a range lock has been obtained.  The parameters are the same as
757for the range lock.
758.It Unlock
759specifies that a range lock has been released.  The parameters are the same as
760for the range lock.
761.El
762.\" XXX
763.Pp
764.It Xo
765.Ic init
766.Op Fl S Ar size
767.Op Fl w
768.Ar plex | subdisk
769.Xc
770.Nm Ic init
771initializes a subdisk by writing zeroes to it.  You can initialize all subdisks
772in a plex by specifying the plex name.  This is the only way to ensure
773consistent data in a plex.  You must perform this initialization before using a
774RAID-5 plex.  It is also recommended for other new plexes.
775.Nm
776initializes all subdisks of a plex in parallel.  Since this operation can take a
777long time, it is normally performed in the background.  If you want to wait for
778completion of the command, use the
779.Fl w
780(wait) option.
781.Pp
782Specify the
783.Fl S
784option if you want to write blocks of a different size from the default value of
78516 kB.
786.Nm
787prints a console message when the initialization is complete.
788.Pp
789.It Ic label Ar volume
790The
791.Ic label
792command writes a
793.Em ufs
794style volume label on a volume.  It is a simple alternative to an appropriate
795call to
796.Ic disklabel .
797This is needed because some
798.Em ufs
799commands still read the disk to find the label instead of using the correct
800.Xr ioctl 2
801call to access it.
802.Nm
803maintains a volume label separately from the volume data, so this command is not
804needed for
805.Xr newfs 8 .
806This command is deprecated.
807.Pp
808.It Xo
809.Ic list
810.Op Fl r
811.Op Fl V
812.Op Ar volume | plex | subdisk
813.Xc
814.It Xo
815.Ic l
816.Op Fl r
817.Op Fl V
818.Op Ar volume | plex | subdisk
819.Xc
820.It Xo
821.Ic ld
822.Op Fl r
823.Op Fl s
824.Op Fl v
825.Op Fl V
826.Op Ar volume
827.Xc
828.It Xo
829.Ic ls
830.Op Fl r
831.Op Fl s
832.Op Fl v
833.Op Fl V
834.Op Ar subdisk
835.Xc
836.It Xo
837.Ic lp
838.Op Fl r
839.Op Fl s
840.Op Fl v
841.Op Fl V
842.Op Ar plex
843.Xc
844.It Xo
845.Ic lv
846.Op Fl r
847.Op Fl s
848.Op Fl v
849.Op Fl V
850.Op Ar volume
851.Xc
852.Ic list
853is used to show information about the specified object.  If the argument is
854omitted, information is shown about all objects known to
855.Nm .
856The
857.Ic l
858command is a synonym for
859.Ic list .
860.Pp
861The
862.Fl r
863option relates to volumes and plexes: if specified, it recursively lists
864information for the subdisks and (for a volume) plexes subordinate to the
865objects.  The commands
866.Ic lv , lp , ls
867and
868.Ic ld
869list only volumes, plexes, subdisks and drives respectively.  This is
870particularly useful when used without parameters.
871.Pp
872The
873.Fl s
874option causes
875.Nm
876to output device statistics, the
877.Fl v
878(verbose) option causes some additional information to be output, and the
879.Fl V
880causes considerable additional information to be output.
881.Pp
882.It Ic makedev
883The
884.Ic makedev
885command removes the directory
886.Pa /dev/vinum
887and recreates it with device nodes
888which reflect the current configuration.  This command is not intended for
889general use, and is provided for emergency use only.
890.Pp
891.It Xo
892.Ic mirror
893.Op Fl f
894.Op Fl n Ar name
895.Op Fl s
896.Op Fl v
897.Ar drives
898.Xc
899The
900.Ic mirror
901command provides a simplified alternative to the
902.Ic create
903command for creating mirrored volumes.  Without any options, it creates a RAID-1
904(mirrored) volume with two concatenated plexes.  The largest contiguous space
905available on each drive is used to create the subdisks for the plexes.  The
906first plex is built from the odd-numbered drives in the list, and the second
907plex is built from the even-numbered drives.  If the drives are of different
908sizes, the plexes will be of different sizes.
909.Pp
910If the
911.Fl s
912option is provided,
913.Ic mirror
914builds striped plexes with a stripe size of 256 kB.  The size of the subdisks in
915each plex is the size of the smallest contiguous storage available on any of the
916drives which form the plex.  Again, the plexes may differ in size.
917.Pp
918Normally, the
919.Ic mirror
920command creates an arbitrary name for the volume and its components.  The name
921is composed of the text
922.Dq Li vinum
923and a small integer, for example
924.Dq Li vinum3 .
925You can override this with the
926.Fl n Ar name
927option, which assigns the name specified to the volume.  The plexes and subdisks
928are named after the volume in the default manner.
929.Pp
930There is no choice of name for the drives.  If the drives have already been
931initialized as
932.Nm
933drives, the name remains.  Otherwise the drives are given names starting with
934the text
935.Dq Li vinumdrive
936and a small integer, for example
937.Dq Li vinumdrive7 .
938As with the
939.Ic create
940command, the
941.Fl f
942option can be used to specify that a previous name should be overwritten.  The
943.Fl v
944is used to specify verbose output.
945.Pp
946See the section
947.Sx SIMPLIFIED CONFIGURATION
948below for some examples of this
949command.
950.Pp
951.It Ic mv Fl f Ar drive object ...
952.It Ic move Fl f Ar drive object ...
953Move all the subdisks from the specified objects onto the new drive.  The
954objects may be subdisks, drives or plexes.  When drives or plexes are specified,
955all subdisks associated with the object are moved.
956.Pp
957The
958.Fl f
959option is required for this function, since it currently does not preserve the
960data in the subdisk.  This functionality will be added at a later date.  In this
961form, however, it is suited to recovering a failed disk drive.
962.Pp
963.It Ic printconfig Op Ar file
964Write a copy of the current configuration to
965.Ar file
966in a format that can be used to recreate the
967.Nm
968configuration.  Unlike the configuration saved on disk, it includes definitions
969of the drives.  If you omit
970.Ar file ,
971.Nm
972writes the list to
973.Dv stdout .
974.Pp
975.It Ic quit
976Exit the
977.Nm
978program when running in interactive mode.  Normally this would be done by
979entering the
980.Dv EOF
981character.
982.Pp
983.It Ic read Ar disk ...
984The
985.Ic read
986command scans the specified disks for
987.Nm
988partitions containing previously created configuration information.  It reads
989the configuration in order from the most recently updated to least recently
990updated configuration.
991.Nm
992maintains an up-to-date copy of all configuration information on each disk
993partition.  You must specify all of the slices in a configuration as the
994parameter to this command.
995.Pp
996The
997.Ic read
998command is intended to selectively load a
999.Nm
1000configuration on a system which has other
1001.Nm
1002partitions.  If you want to start all partitions on the system, it is easier to
1003use the
1004.Ic start
1005command.
1006.Pp
1007If
1008.Nm
1009encounters any errors during this command, it will turn off automatic
1010configuration update to avoid corrupting the copies on disk.  This will also
1011happen if the configuration on disk indicates a configuration error (for
1012example, subdisks which do not have a valid space specification).  You can turn
1013the updates on again with the
1014.Ic setdaemon
1015and
1016.Ic saveconfig
1017commands.  Reset bit 2 (numerical value 4) of the daemon options mask to
1018re-enable configuration saves.
1019.Pp
1020.It Xo
1021.Ic rebuildparity
1022.Ar plex
1023.Op Fl f
1024.Op Fl v
1025.Op Fl V
1026.Xc
1027Rebuild the parity blocks on the specified RAID-4 or RAID-5 plex.  This
1028operation maintains a pointer in the plex, so it can be stopped and later
1029restarted from the same position if desired.  In addition, this pointer is used
1030by the
1031.Ic checkparity
1032command, so rebuilding the parity blocks need only start at the location where
1033the first parity problem has been detected.
1034.Pp
1035If the
1036.Fl f
1037flag is specified,
1038.Ic rebuildparity
1039starts rebuilding at the beginning of the plex.  If the
1040.Fl v
1041flag is specified,
1042.Ic rebuildparity
1043first checks the existing parity blocks prints information about those found to
1044be incorrect before rebuilding.  If the
1045.Fl V
1046flag is specified,
1047.Ic rebuildparity
1048prints a running progress report.
1049.Pp
1050.It Xo
1051.Ic rename
1052.Op Fl r
1053.Op Ar drive | subdisk | plex | volume
1054.Ar newname
1055.Xc
1056Change the name of the specified object.  If the
1057.Fl r
1058option is specified, subordinate objects will be named by the default rules:
1059plex names will be formed by appending
1060.Li .p Ns Ar number
1061to the volume name, and
1062subdisk names will be formed by appending
1063.Li .s Ns Ar number
1064to the plex name.
1065.\".Pp
1066.\".It Xo
1067.\".Ic replace
1068.\".Ar drive newdrive
1069.\"Move all the subdisks from the specified drive onto the new drive.  This will
1070.\"attempt to recover those subdisks that can be recovered, and create the others
1071.\"from scratch.  If the new drive lacks the space for this operation, as many
1072.\"subdisks as possible will be fitted onto the drive, and the rest will be left on
1073.\"the original drive.
1074.Pp
1075.It Ic resetconfig
1076The
1077.Ic resetconfig
1078command completely obliterates the
1079.Nm
1080configuration on a system.  Use this command only when you want to completely
1081delete the configuration.
1082.Nm
1083will ask for confirmation; you must type in the words
1084.Li "NO FUTURE"
1085exactly as shown:
1086.Bd -unfilled -offset indent
1087.No # Nm Ic resetconfig
1088
1089WARNING!  This command will completely wipe out your vinum
1090configuration.  All data will be lost.  If you really want
1091to do this, enter the text
1092
1093NO FUTURE
1094.No "Enter text ->" Sy "NO FUTURE"
1095Vinum configuration obliterated
1096.Ed
1097.Pp
1098As the message suggests, this is a last-ditch command.  Don't use it unless you
1099have an existing configuration which you never want to see again.
1100.Pp
1101.It Xo
1102.Ic resetstats
1103.Op Fl r
1104.Op Ar volume | plex | subdisk
1105.Xc
1106.Nm
1107maintains a number of statistical counters for each object.  See the header file
1108.Aq Pa sys/dev/vinumvar.h
1109for more information.
1110.\" XXX put it in here when it's finalized
1111Use the
1112.Ic resetstats
1113command to reset these counters.  In conjunction with the
1114.Fl r
1115option,
1116.Nm
1117also resets the counters of subordinate objects.
1118.Pp
1119.It Xo
1120.Ic rm
1121.Op Fl f
1122.Op Fl r
1123.Ar volume | plex | subdisk
1124.Xc
1125.Ic rm
1126removes an object from the
1127.Nm
1128configuration.  Once an object has been removed, there is no way to recover it.
1129Normally
1130.Nm
1131performs a large amount of consistency checking before removing an object.  The
1132.Fl f
1133option tells
1134.Nm
1135to omit this checking and remove the object anyway.  Use this option with great
1136care: it can result in total loss of data on a volume.
1137.Pp
1138Normally,
1139.Nm
1140refuses to remove a volume or plex if it has subordinate plexes or subdisks
1141respectively.  You can tell
1142.Nm
1143to remove the object anyway by using the
1144.Fl f
1145option, or you can cause
1146.Nm
1147to remove the subordinate objects as well by using the
1148.Fl r
1149(recursive) option.  If you remove a volume with the
1150.Fl r
1151option, it will remove both the plexes and the subdisks which belong to the
1152plexes.
1153.Pp
1154.It Ic saveconfig
1155Save the current configuration to disk.  Normally this is not necessary, since
1156.Nm
1157automatically saves any change in configuration.  If an error occurs on startup,
1158updates will be disabled.  When you reenable them with the
1159.Ic setdaemon
1160command,
1161.Nm
1162does not automatically save the configuration to disk.  Use this command to save
1163the configuration.
1164.\".Pp
1165.\".It Xo
1166.\".Ic set
1167.\".Op Fl f
1168.\".Ar state
1169.\".Ar volume | plex | subdisk | disk
1170.\".Xc
1171.\".Ic set
1172.\"sets the state of the specified object to one of the valid states (see
1173.\".Sx OBJECT STATES
1174.\"below).  Normally
1175.\".Nm
1176.\"performs a large amount of consistency checking before making the change.  The
1177.\".Fl f
1178.\"option tells
1179.\".Nm
1180.\"to omit this checking and perform the change anyway.  Use this option with great
1181.\"care: it can result in total loss of data on a volume.
1182.Pp
1183.It Ic setdaemon Op Ar value
1184.Ic setdaemon
1185sets a variable bitmask for the
1186.Nm
1187daemon.  This command is temporary and will be replaced.  Currently, the bit mask
1188may contain the bits 1 (log every action to syslog) and 4 (don't update
1189configuration).  Option bit 4 can be useful for error recovery.
1190.Pp
1191.It Xo
1192.Ic setstate Ar state
1193.Op Ar volume | plex | subdisk | drive
1194.Xc
1195.Ic setstate
1196sets the state of the specified objects to the specified state.  This bypasses
1197the usual consistency mechanism of
1198.Nm
1199and should be used only for recovery purposes.  It is possible to crash the
1200system by incorrect use of this command.
1201.Pp
1202.It Xo
1203.Ic start
1204.Op Fl i Ar interval
1205.Op Fl S Ar size
1206.Op Fl w
1207.Op Ar plex | subdisk
1208.Xc
1209.Ic start
1210starts (brings into to the
1211.Em up
1212state) one or more
1213.Nm
1214objects.
1215.Pp
1216If no object names are specified,
1217.Nm
1218scans the disks known to the system for
1219.Nm
1220drives and then reads in the configuration as described under the
1221.Ic read
1222commands.  The
1223.Nm
1224drive contains a header with all information about the data stored on the drive,
1225including the names of the other drives which are required in order to represent
1226plexes and volumes.
1227.Pp
1228If
1229.Nm
1230encounters any errors during this command, it will turn off automatic
1231configuration update to avoid corrupting the copies on disk.  This will also
1232happen if the configuration on disk indicates a configuration error (for
1233example, subdisks which do not have a valid space specification).  You can turn
1234the updates on again with the
1235.Ic setdaemon
1236and
1237.Ic saveconfig
1238command.  Reset bit 4 of the daemon options mask to re-enable configuration
1239saves.
1240.Pp
1241If object names are specified,
1242.Nm
1243starts them.  Normally this operation is only of use with subdisks.  The action
1244depends on the current state of the object:
1245.Bl -bullet
1246.It
1247If the object is already in the
1248.Em up
1249state,
1250.Nm
1251does nothing.
1252.It
1253If the object is a subdisk in the
1254.Em down
1255or
1256.Em reborn
1257states,
1258.Nm
1259changes it to the
1260.Em up
1261state.
1262.It
1263If the object is a subdisk in the
1264.Em empty
1265state, the change depends on the subdisk.  If it is part of a plex which is part
1266of a volume which contains other plexes,
1267.Nm
1268places the subdisk in the
1269.Em reviving
1270state and attempts to copy the data from the volume.  When the operation
1271completes, the subdisk is set into the
1272.Em up
1273state.  If it is part of a plex which is part of a volume which contains no
1274other plexes, or if it is not part of a plex,
1275.Nm
1276brings it into the
1277.Em up
1278state immediately.
1279.It
1280If the object is a subdisk in the
1281.Em reviving
1282state,
1283.Nm
1284continues the revive
1285operation offline.  When the operation completes, the subdisk is set into the
1286.Em up
1287state.
1288.El
1289.Pp
1290When a subdisk comes into the
1291.Em up
1292state,
1293.Nm
1294automatically checks the state of any plex and volume to which it may belong and
1295changes their state where appropriate.
1296.Pp
1297If the object is a plex,
1298.Ic start
1299checks the state of the subordinate subdisks (and plexes in the case of a
1300volume) and starts any subdisks which can be started.
1301.Pp
1302To start a plex in a multi-plex volume, the data must be copied from another
1303plex in the volume.  Since this frequently takes a long time, it is normally
1304done in the background.  If you want to wait for this operation to complete (for
1305example, if you are performing this operation in a script), use the
1306.Fl w
1307option.
1308.Pp
1309Copying data doesn't just take a long time, it can also place a significant load
1310on the system.  You can specify the transfer size in bytes or sectors with the
1311.Fl S
1312option, and an interval (in milliseconds) to wait between copying each block with
1313the
1314.Fl i
1315option.  Both of these options lessen the load on the system.
1316.Pp
1317.It Xo
1318.Ic stop
1319.Op Fl f
1320.Op Ar volume | plex | subdisk
1321.Xc
1322If no parameters are specified,
1323.Ic stop
1324removes the
1325.Nm
1326KLD and stops
1327.Xr vinum 4 .
1328This can only be done if no objects are active.  In particular, the
1329.Fl f
1330option does not override this requirement.  Normally, the
1331.Ic stop
1332command writes the current configuration back to the drives before terminating.
1333This will not be possible if configuration updates are disabled, so
1334.Nm
1335will not stop if configuration updates are disabled.  You can override this by
1336specifying the
1337.Fl f
1338option.
1339.Pp
1340The
1341.Ic stop
1342command can only work if
1343.Nm
1344has been loaded as a KLD, since it is not possible to unload a statically
1345configured driver.
1346.Nm Ic stop
1347will fail if
1348.Nm
1349is statically configured.
1350.Pp
1351If object names are specified,
1352.Ic stop
1353disables access to the objects.  If the objects have subordinate objects, they
1354subordinate objects must either already be inactive (stopped or in error), or
1355the
1356.Fl r
1357and
1358.Fl f
1359options must be specified.  This command does not remove the objects from the
1360configuration.  They can be accessed again after a
1361.Ic start
1362command.
1363.Pp
1364By default,
1365.Nm
1366does not stop active objects.  For example, you cannot stop a plex which is
1367attached to an active volume, and you cannot stop a volume which is open.  The
1368.Fl f
1369option tells
1370.Nm
1371to omit this checking and remove the object anyway.  Use this option with great
1372care and understanding: used incorrectly, it can result in serious data
1373corruption.
1374.Pp
1375.It Xo
1376.Ic stripe
1377.Op Fl f
1378.Op Fl n Ar name
1379.Op Fl v
1380.Ar drives
1381.Xc
1382The
1383.Ic stripe
1384command provides a simplified alternative to the
1385.Ic create
1386command for creating volumes with a single striped plex.  The size of the
1387subdisks is the size of the largest contiguous space available on all the
1388specified drives.  The stripe size is fixed at 256 kB.
1389.Pp
1390Normally, the
1391.Ic stripe
1392command creates an arbitrary name for the volume and its components.  The name
1393is composed of the text
1394.Dq Li vinum
1395and a small integer, for example
1396.Dq Li vinum3 .
1397You can override this with the
1398.Fl n Ar name
1399option, which assigns the name specified to the volume.  The plexes and subdisks
1400are named after the volume in the default manner.
1401.Pp
1402There is no choice of name for the drives.  If the drives have already been
1403initialized as
1404.Nm
1405drives, the name remains.  Otherwise the drives are given names starting with
1406the text
1407.Dq Li vinumdrive
1408and a small integer, for example
1409.Dq Li vinumdrive7 .
1410As with the
1411.Ic create
1412command, the
1413.Fl f
1414option can be used to specify that a previous name should be overwritten.  The
1415.Fl v
1416is used to specify verbose output.
1417.Pp
1418See the section
1419.Sx SIMPLIFIED CONFIGURATION
1420below for some examples of this
1421command.
1422.El
1423.Sh SIMPLIFIED CONFIGURATION
1424This section describes a simplified interface to
1425.Nm
1426configuration using the
1427.Ic concat ,
1428.Ic mirror
1429and
1430.Ic stripe
1431commands.  These commands create convenient configurations for some more normal
1432situations, but they are not as flexible as the
1433.Ic create
1434command.
1435.Pp
1436See above for the description of the commands.  Here are some examples, all
1437performed with the same collection of disks.  Note that the first drive,
1438.Pa /dev/da1h ,
1439is smaller than the others.  This has an effect on the sizes chosen for each
1440kind of subdisk.
1441.Pp
1442The following examples all use the
1443.Fl v
1444option to show the commands passed to the system, and also to list the structure
1445of the volume.  Without the
1446.Fl v
1447option, these commands produce no output.
1448.Ss Volume with a single concatenated plex
1449Use a volume with a single concatenated plex for the largest possible storage
1450without resilience to drive failures:
1451.Bd -literal
1452vinum -> concat -v /dev/da1h /dev/da2h /dev/da3h /dev/da4h
1453volume vinum0
1454  plex name vinum0.p0 org concat
1455drive vinumdrive0 device /dev/da1h
1456    sd name vinum0.p0.s0 drive vinumdrive0 size 0
1457drive vinumdrive1 device /dev/da2h
1458    sd name vinum0.p0.s1 drive vinumdrive1 size 0
1459drive vinumdrive2 device /dev/da3h
1460    sd name vinum0.p0.s2 drive vinumdrive2 size 0
1461drive vinumdrive3 device /dev/da4h
1462    sd name vinum0.p0.s3 drive vinumdrive3 size 0
1463V vinum0                State: up       Plexes:       1 Size:       2134 MB
1464P vinum0.p0           C State: up       Subdisks:     4 Size:       2134 MB
1465S vinum0.p0.s0          State: up       PO:        0  B Size:        414 MB
1466S vinum0.p0.s1          State: up       PO:      414 MB Size:        573 MB
1467S vinum0.p0.s2          State: up       PO:      988 MB Size:        573 MB
1468S vinum0.p0.s3          State: up       PO:     1561 MB Size:        573 MB
1469.Ed
1470.Pp
1471In this case, the complete space on all four disks was used, giving a volume
14722134 MB in size.
1473.Ss Volume with a single striped plex
1474A volume with a single striped plex may give better performance than a
1475concatenated plex, but restrictions on striped plexes can mean that the volume
1476is smaller.  It will also not be resilient to a drive failure:
1477.Bd -literal
1478vinum -> stripe -v /dev/da1h /dev/da2h /dev/da3h /dev/da4h
1479drive vinumdrive0 device /dev/da1h
1480drive vinumdrive1 device /dev/da2h
1481drive vinumdrive2 device /dev/da3h
1482drive vinumdrive3 device /dev/da4h
1483volume vinum0
1484  plex name vinum0.p0 org striped 256k
1485    sd name vinum0.p0.s0 drive vinumdrive0 size 849825b
1486    sd name vinum0.p0.s1 drive vinumdrive1 size 849825b
1487    sd name vinum0.p0.s2 drive vinumdrive2 size 849825b
1488    sd name vinum0.p0.s3 drive vinumdrive3 size 849825b
1489V vinum0                State: up       Plexes:       1 Size:       1659 MB
1490P vinum0.p0           S State: up       Subdisks:     4 Size:       1659 MB
1491S vinum0.p0.s0          State: up       PO:        0  B Size:        414 MB
1492S vinum0.p0.s1          State: up       PO:      256 kB Size:        414 MB
1493S vinum0.p0.s2          State: up       PO:      512 kB Size:        414 MB
1494S vinum0.p0.s3          State: up       PO:      768 kB Size:        414 MB
1495.Ed
1496.Pp
1497In this case, the size of the subdisks has been limited to the smallest
1498available disk, so the resulting volume is only 1659 MB in size.
1499.Ss Mirrored volume with two concatenated plexes
1500For more reliability, use a mirrored, concatenated volume:
1501.Bd -literal
1502vinum -> mirror -v -n mirror /dev/da1h /dev/da2h /dev/da3h /dev/da4h
1503drive vinumdrive0 device /dev/da1h
1504drive vinumdrive1 device /dev/da2h
1505drive vinumdrive2 device /dev/da3h
1506drive vinumdrive3 device /dev/da4h
1507volume mirror setupstate
1508  plex name mirror.p0 org concat
1509    sd name mirror.p0.s0 drive vinumdrive0 size 0b
1510    sd name mirror.p0.s1 drive vinumdrive2 size 0b
1511  plex name mirror.p1 org concat
1512    sd name mirror.p1.s0 drive vinumdrive1 size 0b
1513    sd name mirror.p1.s1 drive vinumdrive3 size 0b
1514V mirror                State: up       Plexes:       2 Size:       1146 MB
1515P mirror.p0           C State: up       Subdisks:     2 Size:        988 MB
1516P mirror.p1           C State: up       Subdisks:     2 Size:       1146 MB
1517S mirror.p0.s0          State: up       PO:        0  B Size:        414 MB
1518S mirror.p0.s1          State: up       PO:      414 MB Size:        573 MB
1519S mirror.p1.s0          State: up       PO:        0  B Size:        573 MB
1520S mirror.p1.s1          State: up       PO:      573 MB Size:        573 MB
1521.Ed
1522.Pp
1523This example specifies the name of the volume,
1524.Ar mirror .
1525Since one drive is smaller than the others, the two plexes are of different
1526size, and the last 158 MB of the volume is non-resilient.  To ensure complete
1527reliability in such a situation, use the
1528.Ic create
1529command to create a volume with 988 MB.
1530.Ss Mirrored volume with two striped plexes
1531Alternatively, use the
1532.Fl s
1533option to create a mirrored volume with two striped plexes:
1534.Bd -literal
1535vinum -> mirror -v -n raid10 -s /dev/da1h /dev/da2h /dev/da3h /dev/da4h
1536drive vinumdrive0 device /dev/da1h
1537drive vinumdrive1 device /dev/da2h
1538drive vinumdrive2 device /dev/da3h
1539drive vinumdrive3 device /dev/da4h
1540volume raid10 setupstate
1541  plex name raid10.p0 org striped 256k
1542    sd name raid10.p0.s0 drive vinumdrive0 size 849825b
1543    sd name raid10.p0.s1 drive vinumdrive2 size 849825b
1544  plex name raid10.p1 org striped 256k
1545    sd name raid10.p1.s0 drive vinumdrive1 size 1173665b
1546    sd name raid10.p1.s1 drive vinumdrive3 size 1173665b
1547V raid10                State: up       Plexes:       2 Size:       1146 MB
1548P raid10.p0           S State: up       Subdisks:     2 Size:        829 MB
1549P raid10.p1           S State: up       Subdisks:     2 Size:       1146 MB
1550S raid10.p0.s0          State: up       PO:        0  B Size:        414 MB
1551S raid10.p0.s1          State: up       PO:      256 kB Size:        414 MB
1552S raid10.p1.s0          State: up       PO:        0  B Size:        573 MB
1553S raid10.p1.s1          State: up       PO:      256 kB Size:        573 MB
1554.Ed
1555.Pp
1556In this case, the usable part of the volume is even smaller, since the first
1557plex has shrunken to match the smallest drive.
1558.Sh CONFIGURATION FILE
1559.Nm
1560requires that all parameters to the
1561.Ic create
1562commands must be in a configuration file.  Entries in the configuration file
1563define volumes, plexes and subdisks, and may be in free format, except that each
1564entry must be on a single line.
1565.Ss Scale factors
1566Some configuration file parameters specify a size (lengths, stripe sizes).
1567These values can be specified as bytes, or one of the following scale factors
1568may be appended:
1569.Bl -tag -width indent
1570.It s
1571specifies that the value is a number of sectors of 512 bytes.
1572.It k
1573specifies that the value is a number of kilobytes (1024 bytes).
1574.It m
1575specifies that the value is a number of megabytes (1048576 bytes).
1576.It g
1577specifies that the value is a number of gigabytes (1073741824 bytes).
1578.It b
1579is used for compatibility with
1580.Tn VERITAS .
1581It stands for blocks of 512 bytes.
1582This abbreviation is confusing, since the word
1583.Dq block
1584is used in different
1585meanings, and its use is deprecated.
1586.El
1587.Pp
1588For example, the value 16777216 bytes can also be written as
1589.Em 16m ,
1590.Em 16384k
1591or
1592.Em 32768s .
1593.Pp
1594The configuration file can contain the following entries:
1595.Bl -tag -width 4n
1596.It Ic drive Ar name devicename Op Ar options
1597Define a drive.  The options are:
1598.Bl -tag -width 18n
1599.It Cm device Ar devicename
1600Specify the device on which the drive resides.
1601.Ar devicename
1602must be the name of a disk partition, for example
1603.Pa /dev/da1e
1604or
1605.Pa /dev/ad3s2h ,
1606and it must be of type
1607.Em vinum .
1608Do not use the
1609.Dq Li c
1610partition, which is reserved for the complete disk.
1611.It Cm hotspare
1612Define the drive to be a
1613.Dq hot spare
1614drive, which is maintained to automatically replace a failed drive.
1615.Nm
1616does not allow this drive to be used for any other purpose.  In particular, it
1617is not possible to create subdisks on it.  This functionality has not been
1618completely implemented.
1619.El
1620.It Ic volume Ar name Op Ar options
1621Define a volume with name
1622.Ar name .
1623Options are:
1624.Bl -tag -width 18n
1625.It Cm plex Ar plexname
1626Add the specified plex to the volume.  If
1627.Ar plexname
1628is specified as
1629.Cm * ,
1630.Nm
1631will look for the definition of the plex as the next possible entry in the
1632configuration file after the definition of the volume.
1633.It Cm readpol Ar policy
1634Define a
1635.Em read policy
1636for the volume.
1637.Ar policy
1638may be either
1639.Cm round
1640or
1641.Cm prefer Ar plexname .
1642.Nm
1643satisfies a read request from only one of the plexes.  A
1644.Cm round
1645read policy specifies that each read should be performed from a different plex
1646in
1647.Em round-robin
1648fashion.  A
1649.Cm prefer
1650read policy reads from the specified plex every time.
1651.It Cm setupstate
1652When creating a multi-plex volume, assume that the contents of all the plexes
1653are consistent.  This is normally not the case, so by default
1654.Nm
1655sets all plexes except the first one to the
1656.Em faulty
1657state.  Use the
1658.Ic start
1659command to first bring them to a consistent state.  In the case of striped and
1660concatenated plexes, however, it does not normally cause problems to leave them
1661inconsistent: when using a volume for a file system or a swap partition, the
1662previous contents of the disks are not of interest, so they may be ignored.
1663If you want to take this risk, use the
1664.Cm setupstate
1665keyword.  It will only apply to the plexes defined immediately after the volume
1666in the configuration file.  If you add plexes to a volume at a later time, you
1667must integrate them manually with the
1668.Ic start
1669command.
1670.Pp
1671Note that you
1672.Em must
1673use the
1674.Ic init
1675command with RAID-5 plexes: otherwise extreme data corruption will result if one
1676subdisk fails.
1677.El
1678.It Ic plex Op Ar options
1679Define a plex.  Unlike a volume, a plex does not need a name.  The options may
1680be:
1681.Bl -tag -width 18n
1682.It Cm name Ar plexname
1683Specify the name of the plex.  Note that you must use the keyword
1684.Cm name
1685when naming a plex or subdisk.
1686.It Cm org Ar organization Op Ar stripesize
1687Specify the organization of the plex.
1688.Ar organization
1689can be one of
1690.Cm concat , striped
1691or
1692.Cm raid5 .
1693For
1694.Cm striped
1695and
1696.Cm raid5
1697plexes, the parameter
1698.Ar stripesize
1699must be specified, while for
1700.Cm concat
1701it must be omitted.  For type
1702.Cm striped ,
1703it specifies the width of each stripe.  For type
1704.Cm raid5 ,
1705it specifies the size of a group.  A group is a portion of a plex which
1706stores the parity bits all in the same subdisk.  It must be a factor of the plex size (in
1707other words, the result of dividing the plex size by the stripe size must be an
1708integer), and it must be a multiple of a disk sector (512 bytes).
1709.Pp
1710For optimum performance, stripes should be at least 128 kB in size: anything
1711smaller will result in a significant increase in I/O activity due to mapping of
1712individual requests over multiple disks.  The performance improvement due to the
1713increased number of concurrent transfers caused by this mapping will not make up
1714for the performance drop due to the increase in latency.  A good guideline for
1715stripe size is between 256 kB and 512 kB.  Avoid powers of 2, however: they tend
1716to cause all superblocks to be placed on the first subdisk.
1717.Pp
1718A striped plex must have at least two subdisks (otherwise it is a concatenated
1719plex), and each must be the same size.  A RAID-5 plex must have at least three
1720subdisks, and each must be the same size.  In practice, a RAID-5 plex should
1721have at least 5 subdisks.
1722.It Cm volume Ar volname
1723Add the plex to the specified volume.  If no
1724.Cm volume
1725keyword is specified, the plex will be added to the last volume mentioned in the
1726configuration file.
1727.It Cm sd Ar sdname offset
1728Add the specified subdisk to the plex at offset
1729.Ar offset .
1730.El
1731.It Ic subdisk Op Ar options
1732Define a subdisk.  Options may be:
1733.Bl -hang -width 18n
1734.It Cm name Ar name
1735Specify the name of a subdisk.  It is not necessary to specify a name for a
1736subdisk, see
1737.Sx OBJECT NAMING
1738above.  Note that you must specify the keyword
1739.Cm name
1740if you wish to name a subdisk.
1741.It Cm plexoffset Ar offset
1742Specify the starting offset of the subdisk in the plex.  If not specified,
1743.Nm
1744allocates the space immediately after the previous subdisk, if any, or otherwise
1745at the beginning of the plex.
1746.It Cm driveoffset Ar offset
1747Specify the starting offset of the subdisk in the drive.  If not specified,
1748.Nm
1749allocates the first contiguous
1750.Ar length
1751bytes of free space on the drive.
1752.It Cm length Ar length
1753Specify the length of the subdisk.  This keyword must be specified.  There is no
1754default, but the value 0 may be specified to mean
1755.Dq "use the largest available contiguous free area on the drive" .
1756If the drive is empty, this means that the entire drive will be used for the
1757subdisk.
1758.Cm length
1759may be shortened to
1760.Cm len .
1761.It Cm plex Ar plex
1762Specify the plex to which the subdisk belongs.  By default, the subdisk belongs
1763to the last plex specified.
1764.It Cm drive Ar drive
1765Specify the drive on which the subdisk resides.  By default, the subdisk resides
1766on the last drive specified.
1767.El
1768.El
1769.Sh EXAMPLE CONFIGURATION FILE
1770.Bd -literal
1771# Sample vinum configuration file
1772#
1773# Our drives
1774drive drive1 device /dev/da1h
1775drive drive2 device /dev/da2h
1776drive drive3 device /dev/da3h
1777drive drive4 device /dev/da4h
1778drive drive5 device /dev/da5h
1779drive drive6 device /dev/da6h
1780# A volume with one striped plex
1781volume tinyvol
1782 plex org striped 512b
1783  sd length 64m drive drive2
1784  sd length 64m drive drive4
1785volume stripe
1786 plex org striped 512b
1787  sd length 512m drive drive2
1788  sd length 512m drive drive4
1789# Two plexes
1790volume concat
1791 plex org concat
1792  sd length 100m drive drive2
1793  sd length 50m drive drive4
1794 plex org concat
1795  sd length 150m drive drive4
1796# A volume with one striped plex and one concatenated plex
1797volume strcon
1798 plex org striped 512b
1799  sd length 100m drive drive2
1800  sd length 100m drive drive4
1801 plex org concat
1802  sd length 150m drive drive2
1803  sd length 50m drive drive4
1804# a volume with a RAID-5 and a striped plex
1805# note that the RAID-5 volume is longer by
1806# the length of one subdisk
1807volume vol5
1808 plex org striped 64k
1809  sd length 1000m drive drive2
1810  sd length 1000m drive drive4
1811 plex org raid5 32k
1812  sd length 500m drive drive1
1813  sd length 500m drive drive2
1814  sd length 500m drive drive3
1815  sd length 500m drive drive4
1816  sd length 500m drive drive5
1817.Ed
1818.Sh DRIVE LAYOUT CONSIDERATIONS
1819.Nm
1820drives are currently
1821.Bx
1822disk partitions.  They must be of type
1823.Em vinum
1824in order to avoid overwriting data used for other purposes.  Use
1825.Nm disklabel Fl e
1826to edit a partition type definition.  The following display shows a typical
1827partition layout as shown by
1828.Xr disklabel 8 :
1829.Bd -literal
18308 partitions:
1831#        size   offset    fstype   [fsize bsize bps/cpg]
1832  a:    81920   344064    4.2BSD        0     0     0   # (Cyl.  240*- 297*)
1833  b:   262144    81920      swap                        # (Cyl.   57*- 240*)
1834  c:  4226725        0    unused        0     0         # (Cyl.    0 - 2955*)
1835  e:    81920        0    4.2BSD        0     0     0   # (Cyl.    0 - 57*)
1836  f:  1900000   425984    4.2BSD        0     0     0   # (Cyl.  297*- 1626*)
1837  g:  1900741  2325984     vinum        0     0     0   # (Cyl. 1626*- 2955*)
1838.Ed
1839.Pp
1840In this example, partition
1841.Dq Li g
1842may be used as a
1843.Nm
1844partition.  Partitions
1845.Dq Li a ,
1846.Dq Li e
1847and
1848.Dq Li f
1849may be used as
1850.Em UFS
1851file systems or
1852.Em ccd
1853partitions.  Partition
1854.Dq Li b
1855is a swap partition, and partition
1856.Dq Li c
1857represents the whole disk and should not be used for any other purpose.
1858.Pp
1859.Nm
1860uses the first 265 sectors on each partition for configuration information, so
1861the maximum size of a subdisk is 265 sectors smaller than the drive.
1862.Sh LOG FILE
1863.Nm
1864maintains a log file, by default
1865.Pa /var/tmp/vinum_history ,
1866in which it keeps track of the commands issued to
1867.Nm .
1868You can override the name of this file by setting the environment variable
1869.Ev VINUM_HISTORY
1870to the name of the file.
1871.Pp
1872Each message in the log file is preceded by a date.  The default format is
1873.Qq Li %e %b %Y %H:%M:%S .
1874See
1875.Xr strftime 3
1876for further details of the format string.  It can be overridden by the
1877environment variable
1878.Ev VINUM_DATEFORMAT .
1879.Sh HOW TO SET UP VINUM
1880This section gives practical advice about how to implement a
1881.Nm
1882system.
1883.Ss Where to put the data
1884The first choice you need to make is where to put the data.  You need dedicated
1885disk partitions for
1886.Nm .
1887They should be partitions, not devices, and they should not be partition
1888.Dq Li c .
1889For example, good names are
1890.Pa /dev/da0e
1891or
1892.Pa /dev/ad3s4a .
1893Bad names are
1894.Pa /dev/da0
1895and
1896.Pa /dev/da0s1 ,
1897both of which represent a device, not a partition, and
1898.Pa /dev/ad1c ,
1899which represents a complete disk and should be of type
1900.Em unused .
1901See the example under
1902.Sx DRIVE LAYOUT CONSIDERATIONS
1903above.
1904.Ss Designing volumes
1905The way you set up
1906.Nm
1907volumes depends on your intentions.  There are a number of possibilities:
1908.Bl -enum
1909.It
1910You may want to join up a number of small disks to make a reasonable sized file
1911system.  For example, if you had five small drives and wanted to use all the
1912space for a single volume, you might write a configuration file like:
1913.Bd -literal -offset indent
1914drive d1 device /dev/da2e
1915drive d2 device /dev/da3e
1916drive d3 device /dev/da4e
1917drive d4 device /dev/da5e
1918drive d5 device /dev/da6e
1919volume bigger
1920 plex org concat
1921   sd length 0 drive d1
1922   sd length 0 drive d2
1923   sd length 0 drive d3
1924   sd length 0 drive d4
1925   sd length 0 drive d5
1926.Ed
1927.Pp
1928In this case, you specify the length of the subdisks as 0, which means
1929.Dq "use the largest area of free space that you can find on the drive" .
1930If the subdisk is the only subdisk on the drive, it will use all available
1931space.
1932.It
1933You want to set up
1934.Nm
1935to obtain additional resilience against disk failures.  You have the choice of
1936RAID-1, also called
1937.Dq mirroring ,
1938or RAID-5, also called
1939.Dq parity .
1940.Pp
1941To set up mirroring, create multiple plexes in a volume.  For example, to create
1942a mirrored volume of 2 GB, you might create the following configuration file:
1943.Bd -literal -offset indent
1944drive d1 device /dev/da2e
1945drive d2 device /dev/da3e
1946volume mirror
1947 plex org concat
1948   sd length 2g drive d1
1949 plex org concat
1950   sd length 2g drive d2
1951.Ed
1952.Pp
1953When creating mirrored drives, it is important to ensure that the data from each
1954plex is on a different physical disk so that
1955.Nm
1956can access the complete address space of the volume even if a drive fails.
1957Note that each plex requires as much data as the complete volume: in this
1958example, the volume has a size of 2 GB, but each plex (and each subdisk)
1959requires 2 GB, so the total disk storage requirement is 4 GB.
1960.Pp
1961To set up RAID-5, create a single plex of type
1962.Cm raid5 .
1963For example, to create an equivalent resilient volume of 2 GB, you might use the
1964following configuration file:
1965.Bd -literal -offset indent
1966drive d1 device /dev/da2e
1967drive d2 device /dev/da3e
1968drive d3 device /dev/da4e
1969drive d4 device /dev/da5e
1970drive d5 device /dev/da6e
1971volume raid
1972 plex org raid5 512k
1973   sd length 512m drive d1
1974   sd length 512m drive d2
1975   sd length 512m drive d3
1976   sd length 512m drive d4
1977   sd length 512m drive d5
1978.Ed
1979.Pp
1980RAID-5 plexes require at least three subdisks, one of which is used for storing
1981parity information and is lost for data storage.  The more disks you use, the
1982greater the proportion of the disk storage can be used for data storage.  In
1983this example, the total storage usage is 2.5 GB, compared to 4 GB for a mirrored
1984configuration.  If you were to use the minimum of only three disks, you would
1985require 3 GB to store the information, for example:
1986.Bd -literal -offset indent
1987drive d1 device /dev/da2e
1988drive d2 device /dev/da3e
1989drive d3 device /dev/da4e
1990volume raid
1991 plex org raid5 512k
1992   sd length 1g drive d1
1993   sd length 1g drive d2
1994   sd length 1g drive d3
1995.Ed
1996.Pp
1997As with creating mirrored drives, it is important to ensure that the data from
1998each subdisk is on a different physical disk so that
1999.Nm
2000can access the complete address space of the volume even if a drive fails.
2001.It
2002You want to set up
2003.Nm
2004to allow more concurrent access to a file system.  In many cases, access to a
2005file system is limited by the speed of the disk.  By spreading the volume across
2006multiple disks, you can increase the throughput in multi-access environments.
2007This technique shows little or no performance improvement in single-access
2008environments.
2009.Nm
2010uses a technique called
2011.Dq striping ,
2012or sometimes RAID-0, to increase this concurrency of access.  The name RAID-0 is
2013misleading: striping does not provide any redundancy or additional reliability.
2014In fact, it decreases the reliability, since the failure of a single disk will
2015render the volume useless, and the more disks you have, the more likely it is
2016that one of them will fail.
2017.Pp
2018To implement striping, use a
2019.Cm striped
2020plex:
2021.Bd -literal -offset indent
2022drive d1 device /dev/da2e
2023drive d2 device /dev/da3e
2024drive d3 device /dev/da4e
2025drive d4 device /dev/da5e
2026volume raid
2027 plex org striped 512k
2028   sd length 512m drive d1
2029   sd length 512m drive d2
2030   sd length 512m drive d3
2031   sd length 512m drive d4
2032.Ed
2033.Pp
2034A striped plex must have at least two subdisks, but the increase in performance
2035is greater if you have a larger number of disks.
2036.It
2037You may want to have the best of both worlds and have both resilience and
2038performance.  This is sometimes called RAID-10 (a combination of RAID-1 and
2039RAID-0), though again this name is misleading.  With
2040.Nm
2041you can do this with the following configuration file:
2042.Bd -literal -offset indent
2043drive d1 device /dev/da2e
2044drive d2 device /dev/da3e
2045drive d3 device /dev/da4e
2046drive d4 device /dev/da5e
2047volume raid setupstate
2048 plex org striped 512k
2049   sd length 512m drive d1
2050   sd length 512m drive d2
2051   sd length 512m drive d3
2052   sd length 512m drive d4
2053 plex org striped 512k
2054   sd length 512m drive d4
2055   sd length 512m drive d3
2056   sd length 512m drive d2
2057   sd length 512m drive d1
2058.Ed
2059.Pp
2060Here the plexes are striped, increasing performance, and there are two of them,
2061increasing reliability.  Note that this example shows the subdisks of the second
2062plex in reverse order from the first plex.  This is for performance reasons and
2063will be discussed below.  In addition, the volume specification includes the
2064keyword
2065.Cm setupstate ,
2066which ensures that all plexes are
2067.Em up
2068after creation.
2069.El
2070.Ss Creating the volumes
2071Once you have created your configuration files, start
2072.Nm
2073and create the volumes.  In this example, the configuration is in the file
2074.Pa configfile :
2075.Bd -literal -offset 2n
2076# vinum create -v configfile
2077   1: drive d1 device /dev/da2e
2078   2: drive d2 device /dev/da3e
2079   3: volume mirror
2080   4:  plex org concat
2081   5:    sd length 2g drive d1
2082   6:  plex org concat
2083   7:    sd length 2g drive d2
2084Configuration summary
2085
2086Drives:         2 (4 configured)
2087Volumes:        1 (4 configured)
2088Plexes:         2 (8 configured)
2089Subdisks:       2 (16 configured)
2090
2091Drive d1:       Device /dev/da2e
2092                Created on vinum.lemis.com at Tue Mar 23 12:30:31 1999
2093                Config last updated Tue Mar 23 14:30:32 1999
2094                Size:      60105216000 bytes (57320 MB)
2095                Used:       2147619328 bytes (2048 MB)
2096                Available: 57957596672 bytes (55272 MB)
2097                State: up
2098                Last error: none
2099Drive d2:       Device /dev/da3e
2100                Created on vinum.lemis.com at Tue Mar 23 12:30:32 1999
2101                Config last updated Tue Mar 23 14:30:33 1999
2102                Size:      60105216000 bytes (57320 MB)
2103                Used:       2147619328 bytes (2048 MB)
2104                Available: 57957596672 bytes (55272 MB)
2105                State: up
2106                Last error: none
2107
2108Volume mirror:  Size: 2147483648 bytes (2048 MB)
2109                State: up
2110                Flags:
2111                2 plexes
2112                Read policy: round robin
2113
2114Plex mirror.p0: Size:   2147483648 bytes (2048 MB)
2115                Subdisks:        1
2116                State: up
2117                Organization: concat
2118                Part of volume mirror
2119Plex mirror.p1: Size:   2147483648 bytes (2048 MB)
2120                Subdisks:        1
2121                State: up
2122                Organization: concat
2123                Part of volume mirror
2124
2125Subdisk mirror.p0.s0:
2126                Size:       2147483648 bytes (2048 MB)
2127                State: up
2128                Plex mirror.p0 at offset 0
2129
2130Subdisk mirror.p1.s0:
2131                Size:       2147483648 bytes (2048 MB)
2132                State: up
2133                Plex mirror.p1 at offset 0
2134.Ed
2135.Pp
2136The
2137.Fl v
2138option tells
2139.Nm
2140to list the file as it configures.  Subsequently it lists the current
2141configuration in the same format as the
2142.Ic list Fl v
2143command.
2144.Ss Creating more volumes
2145Once you have created the
2146.Nm
2147volumes,
2148.Nm
2149keeps track of them in its internal configuration files.  You do not need to
2150create them again.  In particular, if you run the
2151.Ic create
2152command again, you will create additional objects:
2153.Bd -literal
2154# vinum create sampleconfig
2155Configuration summary
2156
2157Drives:         2 (4 configured)
2158Volumes:        1 (4 configured)
2159Plexes:         4 (8 configured)
2160Subdisks:       4 (16 configured)
2161
2162D d1                    State: up       Device /dev/da2e        Avail: 53224/57320 MB (92%)
2163D d2                    State: up       Device /dev/da3e        Avail: 53224/57320 MB (92%)
2164
2165V mirror                State: up       Plexes:       4 Size:       2048 MB
2166
2167P mirror.p0           C State: up       Subdisks:     1 Size:       2048 MB
2168P mirror.p1           C State: up       Subdisks:     1 Size:       2048 MB
2169P mirror.p2           C State: up       Subdisks:     1 Size:       2048 MB
2170P mirror.p3           C State: up       Subdisks:     1 Size:       2048 MB
2171
2172S mirror.p0.s0          State: up       PO:        0  B Size:       2048 MB
2173S mirror.p1.s0          State: up       PO:        0  B Size:       2048 MB
2174S mirror.p2.s0          State: up       PO:        0  B Size:       2048 MB
2175S mirror.p3.s0          State: up       PO:        0  B Size:       2048 MB
2176.Ed
2177.Pp
2178As this example (this time with the
2179.Fl f
2180option) shows, re-running the
2181.Ic create
2182has created four new plexes, each with a new subdisk.  If you want to add other
2183volumes, create new configuration files for them.  They do not need to reference
2184the drives that
2185.Nm
2186already knows about.  For example, to create a volume
2187.Pa raid
2188on the four drives
2189.Pa /dev/da1e , /dev/da2e , /dev/da3e
2190and
2191.Pa /dev/da4e ,
2192you only need to mention the other two:
2193.Bd -literal -offset indent
2194drive d3 device /dev/da1e
2195drive d4 device /dev/da4e
2196volume raid
2197  plex org raid5 512k
2198    sd size 2g drive d1
2199    sd size 2g drive d2
2200    sd size 2g drive d3
2201    sd size 2g drive d4
2202.Ed
2203.Pp
2204With this configuration file, we get:
2205.Bd -literal
2206# vinum create newconfig
2207Configuration summary
2208
2209Drives:         4 (4 configured)
2210Volumes:        2 (4 configured)
2211Plexes:         5 (8 configured)
2212Subdisks:       8 (16 configured)
2213
2214D d1                    State: up       Device /dev/da2e        Avail: 51176/57320 MB (89%)
2215D d2                    State: up       Device /dev/da3e        Avail: 53220/57320 MB (89%)
2216D d3                    State: up       Device /dev/da1e        Avail: 53224/57320 MB (92%)
2217D d4                    State: up       Device /dev/da4e        Avail: 53224/57320 MB (92%)
2218
2219V mirror                State: down     Plexes:       4 Size:       2048 MB
2220V raid                  State: down     Plexes:       1 Size:       6144 MB
2221
2222P mirror.p0           C State: init     Subdisks:     1 Size:       2048 MB
2223P mirror.p1           C State: init     Subdisks:     1 Size:       2048 MB
2224P mirror.p2           C State: init     Subdisks:     1 Size:       2048 MB
2225P mirror.p3           C State: init     Subdisks:     1 Size:       2048 MB
2226P raid.p0            R5 State: init     Subdisks:     4 Size:       6144 MB
2227
2228S mirror.p0.s0          State: up       PO:        0  B Size:       2048 MB
2229S mirror.p1.s0          State: up       PO:        0  B Size:       2048 MB
2230S mirror.p2.s0          State: up       PO:        0  B Size:       2048 MB
2231S mirror.p3.s0          State: up       PO:        0  B Size:       2048 MB
2232S raid.p0.s0            State: empty    PO:        0  B Size:       2048 MB
2233S raid.p0.s1            State: empty    PO:      512 kB Size:       2048 MB
2234S raid.p0.s2            State: empty    PO:     1024 kB Size:       2048 MB
2235S raid.p0.s3            State: empty    PO:     1536 kB Size:       2048 MB
2236.Ed
2237.Pp
2238Note the size of the RAID-5 plex: it is only 6 GB, although together its
2239components use 8 GB of disk space.  This is because the equivalent of one
2240subdisk is used for storing parity data.
2241.Ss Restarting Vinum
2242On rebooting the system, start
2243.Nm
2244with the
2245.Ic start
2246command:
2247.Pp
2248.Dl "# vinum start"
2249.Pp
2250This will start all the
2251.Nm
2252drives in the system.  If for some reason you wish to start only some of them,
2253use the
2254.Ic read
2255command.
2256.Ss Performance considerations
2257A number of misconceptions exist about how to set up a RAID array for best
2258performance.  In particular, most systems use far too small a stripe size.  The
2259following discussion applies to all RAID systems, not just to
2260.Nm .
2261.Pp
2262The
2263.Fx
2264block I/O system issues requests of between .5kB and 128 kB; a
2265typical mix is somewhere round 8 kB.  You can't stop any striping system from
2266breaking a request into two physical requests, and if you make the stripe small
2267enough, it can be broken into several.  This will result in a significant drop
2268in performance: the decrease in transfer time per disk is offset by the order of
2269magnitude greater increase in latency.
2270.Pp
2271With modern disk sizes and the
2272.Fx
2273I/O system, you can expect to have a
2274reasonably small number of fragmented requests with a stripe size between 256 kB
2275and 512 kB; with correct RAID implementations there is no obvious reason not to
2276increase the size to 2 or 4 MB on a large disk.
2277.Pp
2278When choosing a stripe size, consider that most current UFS file systems have
2279cylinder groups 32 MB in size.  If you have a stripe size and number of disks
2280both of which are a power of two, it is probable that all superblocks and inodes
2281will be placed on the same subdisk, which will impact performance significantly.
2282Choose an odd number instead, for example 479 kB.
2283.Pp
2284The easiest way to consider the impact of any transfer in a multi-access system
2285is to look at it from the point of view of the potential bottleneck, the disk
2286subsystem: how much total disk time does the transfer use?
2287Since just about
2288everything is cached, the time relationship between the request and its
2289completion is not so important: the important parameter is the total time that
2290the request keeps the disks active, the time when the disks are not available to
2291perform other transfers.  As a result, it doesn't really matter if the transfers
2292are happening at the same time or different times.  In practical terms, the time
2293we're looking at is the sum of the total latency (positioning time and
2294rotational latency, or the time it takes for the data to arrive under the disk
2295heads) and the total transfer time.  For a given transfer to disks of the same
2296speed, the transfer time depends only on the total size of the transfer.
2297.Pp
2298Consider a typical news article or web page of 24 kB, which will probably be
2299read in a single I/O.  Take disks with a transfer rate of 6 MB/s and an average
2300positioning time of 8 ms, and a file system with 4 kB blocks.  Since it's 24 kB,
2301we don't have to worry about fragments, so the file will start on a 4 kB
2302boundary.  The number of transfers required depends on where the block starts:
2303it's (S + F - 1) / S, where S is the stripe size in file system blocks, and F is
2304the file size in file system blocks.
2305.Bl -enum
2306.It
2307Stripe size of 4 kB.  You'll have 6 transfers.  Total subsystem load: 48 ms
2308latency, 2 ms transfer, 50 ms total.
2309.It
2310Stripe size of 8 kB.  On average, you'll have 3.5 transfers.  Total subsystem
2311load: 28 ms latency, 2 ms transfer, 30 ms total.
2312.It
2313Stripe size of 16 kB.  On average, you'll have 2.25 transfers.  Total subsystem
2314load: 18 ms latency, 2 ms transfer, 20 ms total.
2315.It
2316Stripe size of 256 kB.  On average, you'll have 1.08 transfers.  Total subsystem
2317load: 8.6 ms latency, 2 ms transfer, 10.6 ms total.
2318.It
2319Stripe size of 4 MB.  On average, you'll have 1.0009 transfers.  Total subsystem
2320load: 8.01 ms latency, 2 ms transfer, 10.01 ms total.
2321.El
2322.Pp
2323It appears that some hardware RAID systems have problems with large stripes:
2324they appear to always transfer a complete stripe to or from disk, so that a
2325large stripe size will have an adverse effect on performance.
2326.Nm
2327does not suffer from this problem: it optimizes all disk transfers and does not
2328transfer unneeded data.
2329.Pp
2330Note that no well-known benchmark program tests true multi-access conditions
2331(more than 100 concurrent users), so it is difficult to demonstrate the validity
2332of these statements.
2333.Pp
2334Given these considerations, the following factors affect the performance of a
2335.Nm
2336volume:
2337.Bl -bullet
2338.It
2339Striping improves performance for multiple access only, since it increases the
2340chance of individual requests being on different drives.
2341.It
2342Concatenating UFS file systems across multiple drives can also improve
2343performance for multiple file access, since UFS divides a file system into
2344cylinder groups and attempts to keep files in a single cylinder group.  In
2345general, it is not as effective as striping.
2346.It
2347Mirroring can improve multi-access performance for reads, since by default
2348.Nm
2349issues consecutive reads to consecutive plexes.
2350.It
2351Mirroring decreases performance for all writes, whether multi-access or single
2352access, since the data must be written to both plexes.  This explains the
2353subdisk layout in the example of a mirroring configuration above: if the
2354corresponding subdisk in each plex is on a different physical disk, the write
2355commands can be issued in parallel, whereas if they are on the same physical
2356disk, they will be performed sequentially.
2357.It
2358RAID-5 reads have essentially the same considerations as striped reads, unless
2359the striped plex is part of a mirrored volume, in which case the performance of
2360the mirrored volume will be better.
2361.It
2362RAID-5 writes are approximately 25% of the speed of striped writes: to perform
2363the write,
2364.Nm
2365must first read the data block and the corresponding parity block, perform some
2366calculations and write back the parity block and the data block, four times as
2367many transfers as for writing a striped plex.  On the other hand, this is offset
2368by the cost of mirroring, so writes to a volume with a single RAID-5 plex are
2369approximately half the speed of writes to a correctly configured volume with two
2370striped plexes.
2371.It
2372When the
2373.Nm
2374configuration changes (for example, adding or removing objects, or the change of
2375state of one of the objects),
2376.Nm
2377writes up to 128 kB of updated configuration to each drive.  The larger the
2378number of drives, the longer this takes.
2379.El
2380.Ss Creating file systems on Vinum volumes
2381You do not need to run
2382.Xr disklabel 8
2383before creating a file system on a
2384.Nm
2385volume.  Just run
2386.Xr newfs 8 .
2387Use the
2388.Fl v
2389option to state that the device is not divided into partitions.  For example, to
2390create a file system on volume
2391.Pa mirror ,
2392enter the following command:
2393.Pp
2394.Dl "# newfs -v /dev/vinum/mirror"
2395.Pp
2396A number of other considerations apply to
2397.Nm
2398configuration:
2399.Bl -bullet
2400.It
2401There is no advantage in creating multiple drives on a single disk.  Each drive
2402uses 131.5 kB of data for label and configuration information, and performance
2403will suffer when the configuration changes.  Use appropriately sized subdisks instead.
2404.It
2405It is possible to increase the size of a concatenated
2406.Nm
2407plex, but currently the size of striped and RAID-5 plexes cannot be increased.
2408Currently the size of an existing UFS file system also cannot be increased, but
2409it is planned to make both plexes and file systems extensible.
2410.El
2411.Sh STATE MANAGEMENT
2412Vinum objects have the concept of
2413.Em state .
2414See
2415.Xr vinum 4
2416for more details.  They are only completely accessible if their state is
2417.Em up .
2418To change an object state to
2419.Em up ,
2420use the
2421.Ic start
2422command.  To change an object state to
2423.Em down ,
2424use the
2425.Ic stop
2426command.  Normally other states are created automatically by the relationship
2427between objects.  For example, if you add a plex to a volume, the subdisks of
2428the plex will be set in the
2429.Em empty
2430state, indicating that, though the hardware is accessible, the data on the
2431subdisk is invalid.  As a result of this state, the plex will be set in the
2432.Em faulty
2433state.
2434.Ss The `reviving' state
2435In many cases, when you start a subdisk the system must copy data to the
2436subdisk.  Depending on the size of the subdisk, this can take a long time.
2437During this time, the subdisk is set in the
2438.Em reviving
2439state.  On successful completion of the copy operation, it is automatically set
2440to the
2441.Em up
2442state.  It is possible for the process performing the revive to be stopped and
2443restarted.  The system keeps track of how far the subdisk has been revived, and
2444when the
2445.Ic start
2446command is reissued, the copying continues from this point.
2447.Pp
2448In order to maintain the consistency of a volume while one or more of its plexes
2449is being revived,
2450.Nm
2451writes to subdisks which have been revived up to the point of the write.  It may
2452also read from the plex if the area being read has already been revived.
2453.Sh GOTCHAS
2454The following points are not bugs, and they have good reasons for existing, but
2455they have shown to cause confusion.  Each is discussed in the appropriate
2456section above.
2457.Bl -enum
2458.It
2459.Nm
2460drives are
2461.Ux
2462disk partitions and must have the partition type
2463.Em vinum .
2464This is different from
2465.Xr ccd 4 ,
2466which expects partitions of type
2467.Em 4.2BSD .
2468This behaviour of
2469.Nm ccd
2470is an invitation to shoot yourself in the foot: with
2471.Nm ccd
2472you can easily overwrite a file system.
2473.Nm
2474will not permit this.
2475.Pp
2476For similar reasons, the
2477.Nm Ic start
2478command will not accept a drive on partition
2479.Dq Li c .
2480Partition
2481.Dq Li c
2482is used by the system to represent the whole disk, and must be of type
2483.Em unused .
2484Clearly there is a conflict here, which
2485.Nm
2486resolves by not using the
2487.Dq Li c
2488partition.
2489.It
2490When you create a volume with multiple plexes,
2491.Nm
2492does not automatically initialize the plexes.  This means that the contents are
2493not known, but they are certainly not consistent.  As a result, by default
2494.Nm
2495sets the state of all newly-created plexes except the first to
2496.Em faulty .
2497In order to synchronize them with the first plex, you must
2498.Ic start
2499them, which causes
2500.Nm
2501to copy the data from a plex which is in the
2502.Em up
2503state.  Depending on the size of the subdisks involved, this can take a long
2504time.
2505.Pp
2506In practice, people aren't too interested in what was in the plex when it was
2507created, and other volume managers cheat by setting them
2508.Em up
2509anyway.
2510.Nm
2511provides two ways to ensure that newly created plexes are
2512.Em up :
2513.Bl -bullet
2514.It
2515Create the plexes and then synchronize them with
2516.Nm Ic start .
2517.It
2518Create the volume (not the plex) with the keyword
2519.Cm setupstate ,
2520which tells
2521.Nm
2522to ignore any possible inconsistency and set the plexes to be
2523.Em up .
2524.El
2525.It
2526Some of the commands currently supported by
2527.Nm
2528are not really needed.  For reasons which I don't understand, however, I find
2529that users frequently try the
2530.Ic label
2531and
2532.Ic resetconfig
2533commands, though especially
2534.Ic resetconfig
2535outputs all sort of dire warnings.  Don't use these commands unless you have a
2536good reason to do so.
2537.It
2538Some state transitions are not very intuitive.  In fact, it's not clear whether
2539this is a bug or a feature.  If you find that you can't start an object in some
2540strange state, such as a
2541.Em reborn
2542subdisk, try first to get it into
2543.Em stopped
2544state, with the
2545.Ic stop
2546or
2547.Ic stop Fl f
2548commands.  If that works, you should then be able to start it.  If you find
2549that this is the only way to get out of a position where easier methods fail,
2550please report the situation.
2551.It
2552If you build the kernel module with the
2553.Fl D Ns Dv VINUMDEBUG
2554option, you must also build
2555.Nm
2556with the
2557.Fl D Ns Dv VINUMDEBUG
2558option, since the size of some data objects used by both components depends on
2559this option.  If you don't do so, commands will fail with the message
2560.Sy Invalid argument ,
2561and a console message will be logged such as
2562.Bl -diag
2563.It "vinumioctl: invalid ioctl from process 247 (vinum): c0e44642"
2564.El
2565.Pp
2566This error may also occur if you use old versions of KLD or userland program.
2567.It
2568The
2569.Nm Ic read
2570command has a particularly emetic syntax.  Once it was the only way to start
2571.Nm ,
2572but now the preferred method is with
2573.Nm Ic start .
2574.Nm Ic read
2575should be used for maintenance purposes only.  Note that its syntax has changed,
2576and the arguments must be disk slices, such as
2577.Pa /dev/da0 ,
2578not partitions such as
2579.Pa /dev/da0e .
2580.El
2581.\"XXX.Sh BUGS
2582.Sh FILES
2583.Bl -tag -width /dev/vinum/control -compact
2584.It Pa /dev/vinum
2585directory with device nodes for
2586.Nm
2587objects
2588.It Pa /dev/vinum/control
2589control device for
2590.Nm
2591.It Pa /dev/vinum/plex
2592directory containing device nodes for
2593.Nm
2594plexes
2595.It Pa /dev/vinum/sd
2596directory containing device nodes for
2597.Nm
2598subdisks
2599.El
2600.Sh ENVIRONMENT
2601.Bl -tag -width VINUM_DATEFORMAT
2602.It Ev VINUM_HISTORY
2603The name of the log file, by default
2604.Pa /var/log/vinum_history .
2605.It Ev VINUM_DATEFORMAT
2606The format of dates in the log file, by default
2607.Qq Li %e %b %Y %H:%M:%S .
2608.It Ev EDITOR
2609The name of the editor to use for editing configuration files, by default
2610.Nm vi .
2611.El
2612.Sh SEE ALSO
2613.Xr strftime 3 ,
2614.Xr vinum 4 ,
2615.Xr disklabel 8 ,
2616.Xr newfs 8
2617.Pp
2618.Pa http://www.vinumvm.org/vinum/ ,
2619.Pa http://www.vinumvm.org/vinum/how-to-debug.html .
2620.Sh AUTHORS
2621.An Greg Lehey Aq grog@lemis.com
2622.Sh HISTORY
2623The
2624.Nm
2625command first appeared in
2626.Fx 3.0 .
2627The RAID-5 component of
2628.Nm
2629was developed for Cybernet Inc.\&
2630.Pq Pa www.cybernet.com
2631for its NetMAX product.
2632