xref: /dragonfly/share/man/man4/vinum.4 (revision 333227be)
1.\"  Hey, Emacs, edit this file in -*- nroff-fill -*- mode
2.\"-
3.\" Copyright (c) 1997, 1998
4.\"	Nan Yang Computer Services Limited.  All rights reserved.
5.\"
6.\"  This software is distributed under the so-called ``Berkeley
7.\"  License'':
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\" 3. All advertising materials mentioning features or use of this software
18.\"    must display the following acknowledgement:
19.\"	This product includes software developed by Nan Yang Computer
20.\"      Services Limited.
21.\" 4. Neither the name of the Company nor the names of its contributors
22.\"    may be used to endorse or promote products derived from this software
23.\"    without specific prior written permission.
24.\"
25.\" This software is provided ``as is'', and any express or implied
26.\" warranties, including, but not limited to, the implied warranties of
27.\" merchantability and fitness for a particular purpose are disclaimed.
28.\" In no event shall the company or contributors be liable for any
29.\" direct, indirect, incidental, special, exemplary, or consequential
30.\" damages (including, but not limited to, procurement of substitute
31.\" goods or services; loss of use, data, or profits; or business
32.\" interruption) however caused and on any theory of liability, whether
33.\" in contract, strict liability, or tort (including negligence or
34.\" otherwise) arising in any way out of the use of this software, even if
35.\" advised of the possibility of such damage.
36.\"
37.\" $FreeBSD: src/share/man/man4/vinum.4,v 1.22.2.9 2002/04/22 08:19:35 kuriyama Exp $
38.\" $DragonFly: src/share/man/man4/vinum.4,v 1.5 2004/07/08 00:14:49 hmp Exp $
39.\"
40.Dd October 5, 1999
41.Dt vinum 4
42.Os
43.Sh NAME
44.Nm vinum
45.Nd Logical Volume Manager
46.Sh SYNOPSIS
47.Cd "kldload vinum"
48.Cd "kldload Vinum"
49.Sh DESCRIPTION
50.Nm
51is a logical volume manager inspired by, but not derived from, the Veritas
52Volume Manager.  It provides the following features:
53.Bl -bullet
54.It
55It provides device-independent logical disks, called \fIvolumes\fP.  Volumes are
56not restricted to the size of any disk on the system.
57.It
58The volumes consist of one or more \fIplexes\fP, each of which contain the
59entire address space of a volume.  This represents an implementation of RAID-1
60(mirroring).  Multiple plexes can also be used for
61.\" XXX What about sparse plexes?  Do we want them?
62.if t .sp
63.Bl -bullet
64.It
65Increased read throughput.
66.Nm
67will read data from the least active disk, so if a volume has plexes on multiple
68disks, more data can be read in parallel.
69.Nm
70reads data from only one plex, but it writes data to all plexes.
71.It
72Increased reliability.  By storing plexes on different disks, data will remain
73available even if one of the plexes becomes unavailable.  In comparison with a
74RAID-5 plex (see below), using multiple plexes requires more storage space, but
75gives better performance, particularly in the case of a drive failure.
76.It
77Additional plexes can be used for on-line data reorganization.  By attaching an
78additional plex and subsequently detaching one of the older plexes, data can be
79moved on-line without compromising access.
80.It
81An additional plex can be used to obtain a consistent dump of a file system.  By
82attaching an additional plex and detaching at a specific time, the detached plex
83becomes an accurate snapshot of the file system at the time of detachment.
84.\" Make sure to flush!
85.El
86.It
87Each plex consists of one or more logical disk slices, called \fIsubdisks\fP.
88Subdisks are defined as a contiguous block of physical disk storage.  A plex may
89consist of any reasonable number of subdisks (in other words, the real limit is
90not the number, but other factors, such as memory and performance, associated
91with maintaining a large number of subdisks).
92.It
93A number of mappings between subdisks and plexes are available:
94.Bl -bullet
95.It
96\fIConcatenated plexes\fP\| consist of one or more subdisks, each of which
97is mapped to a contiguous part of the plex address space.
98.It
99\fIStriped plexes\fP\| consist of two or more subdisks of equal size.  The file
100address space is mapped in \fIstripes\fP, integral fractions of the subdisk
101size.  Consecutive plex address space is mapped to stripes in each subdisk in
102.if n turn.
103.if t \{\
104turn.
105.ig
106.\" FIXME
107.br
108.ne 1.5i
109.PS
110move right 2i
111down
112SD0: box
113SD1: box
114SD2: box
115
116"plex 0" at SD0.n+(0,.2)
117"subdisk 0" rjust at SD0.w-(.2,0)
118"subdisk 1" rjust at SD1.w-(.2,0)
119"subdisk 2" rjust at SD2.w-(.2,0)
120.PE
121..
122.\}
123The subdisks of a striped plex must all be the same size.
124.It
125\fIRAID-5 plexes\fP\| require at least three equal-sized subdisks.  They
126resemble striped plexes, except that in each stripe, one subdisk stores parity
127information.  This subdisk changes in each stripe: in the first stripe, it is the
128first subdisk, in the second it is the second subdisk, etc.  In the event of a
129single disk failure,
130.Nm
131will recover the data based on the information stored on the remaining subdisks.
132This mapping is particularly suited to read-intensive access.  The subdisks of a
133RAID-5 plex must all be the same size.
134.\" Make sure to flush!
135.El
136.It
137.Nm Drives
138are the lowest level of the storage hierarchy.  They represent disk special
139devices.
140.It
141.Nm
142offers automatic startup.  Unlike UNIX file systems,
143.Nm
144volumes contain all the configuration information needed to ensure that they are
145started correctly when the subsystem is enabled.  This is also a significant
146advantage over the Veritas\(tm File System.  This feature regards the presence
147of the volumes.  It does not mean that the volumes will be mounted
148automatically, since the standard startup procedures with
149.Pa /etc/fstab
150perform this function.
151.El
152.Sh KERNEL CONFIGURATION
153.Nm
154is currently supplied as a kernel loadable module (kld), and does not require
155configuration.  As with other klds, it is absolutely necessary to match the kld
156to the version of the operating system.  Failure to do so will cause
157.Nm
158to issue an error message and terminate.
159.Pp
160It is possible to configure
161.Nm
162in the kernel, but this is not recommended.  To do so, add this line to the
163kernel configuration file:
164.Bd -literal -offset indent
165pseudo-device	vinum
166.Ed
167.Pp
168.Ss DEBUG OPTIONS
169The current version of
170.Nm ,
171both the kernel module and the user program
172.Xr vinum 8 ,
173include significant debugging support.  It is not recommended to remove
174this support at the moment, but if you do you must remove it from both the
175kernel and the user components.  To do this, edit the files
176.Pa /usr/src/sbin/vinum/Makefile
177and
178.Pa /sys/dev/raid/vinum/Makefile
179and edit the CFLAGS variable to remove the -DVINUMDEBUG option.  If you have
180configured
181.Nm
182into the kernel, either specify the line
183.Bd -literal -offset indent
184options		VINUMDEBUG
185.Ed
186.Pp
187in the kernel configuration file or remove the -DVINUMDEBUG option from
188.Pa /usr/src/sbin/vinum/Makefile
189as described above.
190.Pp
191If the VINUMDEBUG variables do not match,
192.Xr vinum 8
193will fail with a message
194explaining the problem and what to do to correct it.
195.Pp
196.Nm
197was previously available in two versions: a freely available version which did
198not contain RAID-5 functionality, and a full version including RAID-5
199functionality, which was available only from Cybernet Systems Inc.  The present
200version of
201.Nm
202includes the RAID-5 functionality.
203.Sh RUNNING VINUM
204.Nm
205is part of the base
206.Dx
207system.  It does not require installation.
208To start it, start the
209.Nm
210program, which will load the kld if it is not already present.
211Before using
212.Nm ,
213it must be configured.  See
214.Xr vinum 8
215for information on how to create a
216.Nm
217configuration.
218.Pp
219Normally, you start a configured version of
220.Nm
221at boot time.  Set the variable
222.Ar start_vinum
223in
224.Pa /etc/rc.conf
225to
226.Ar YES
227to start
228.Nm
229at boot time.
230.Pp
231If
232.Nm
233is loaded as a kld (the recommended way), the
234.Nm
235.Ar stop
236command will unload it.  You can also do this with the
237.Nm kldunload
238command.
239.Pp
240The kld can only be unloaded when idle, in other words when no volumes are
241mounted and no other instances of the
242.Nm
243program are active.  Unloading the kld does not harm the data in the volumes.
244.Ss CONFIGURING AND STARTING OBJECTS
245Use the
246.Xr vinum 8
247utility to configure and start
248.Nm
249objects.
250.Sh IOCTL CALLS
251.Pa ioctl
252calls are intended for the use of the
253.Nm
254configuration program only.  They are described in the header file
255.Pa /sys/dev/raid/vinum/vinumio.h
256.Ss DISK LABELS
257Conventional disk special devices have a
258.Em disk label
259in the second sector of the device.  See
260.Xr disklabel 5
261for more details.  This disk label describes the layout of the partitions within
262the device.
263.Nm
264does not subdivide volumes, so volumes do not contain a physical disk label.
265For convenience,
266.Nm
267implements the ioctl calls DIOCGDINFO (get disk label), DIOCGPART (get partition
268information), DIOCWDINFO (write partition information) and DIOCSDINFO (set
269partition information).  DIOCGDINFO and DIOCGPART refer to an internal
270representation of the disk label which is not present on the volume.  As a
271result, the
272.Fl r
273option of
274.Xr disklabel 8 ,
275which reads the
276.if t ``raw disk'',
277.if n "raw disk",
278will fail.
279.Pp
280In general,
281.Xr disklabel 8
282serves no useful purpose on a vinum volume.  If you run it, it will show you
283three partitions, a, b and c, all the same except for the fstype, for example:
284.br
285.ne 1i
286.Bd -literal -offset
2873 partitions:
288#        size   offset    fstype   [fsize bsize bps/cpg]
289  a:     2048        0    4.2BSD     1024  8192     0   # (Cyl.    0 - 0)
290  b:     2048        0      swap                        # (Cyl.    0 - 0)
291  c:     2048        0    unused        0     0         # (Cyl.    0 - 0)
292.Ed
293.Pp
294.Nm
295ignores the DIOCWDINFO and DIOCSDINFO ioctls, since there is nothing to change.
296As a result, any attempt to modify the disk label will be silently ignored.
297.Sh MAKING FILE SYSTEMS
298Since
299.Nm
300volumes do not contain partitions, the names do not need to conform to the
301standard rules for naming disk partitions.  For a physical disk partition, the
302last letter of the device name specifies the partition identifier (a to h).
303.Nm
304volumes need not conform to this convention, but if they do not,
305.Nm newfs
306will complain that it cannot determine the partition.  To solve this problem,
307use the
308.Fl v
309flag to
310.Nm newfs .
311For example, if you have a volume
312.Pa concat ,
313use the following command to create a ufs file system on it:
314.Pp
315.Bd -literal
316  # newfs -v /dev/vinum/concat
317.Ed
318.Pp
319.Sh OBJECT NAMING
320.Nm
321assigns default names to plexes and subdisks, although they may be overridden.
322We do not recommend overriding the default names.  Experience with the
323.if t Veritas\(tm
324.if n Veritas(tm)
325volume manager, which allows arbitary naming of objects, has shown that this
326flexibility does not bring a significant advantage, and it can cause confusion.
327.sp
328Names may contain any non-blank character, but it is recommended to restrict
329them to letters, digits and the underscore characters.  The names of volumes,
330plexes and subdisks may be up to 64 characters long, and the names of drives may
331up to 32 characters long.  When choosing volume and plex names, bear in mind
332that automatically generated plex and subdisk names are longer than the name
333from which they are derived.
334.Bl -bullet
335.It
336When
337.Xr vinum 8
338creates or deletes objects, it creates a directory
339.Pa /dev/vinum ,
340in which it makes device entries for each volume.  It also creates the
341subdirectories
342.Pa /dev/vinum/plex
343and
344.Pa /dev/vinum/sd ,
345in which it stores device entries for the plexes and subdisks.  In addition, it
346creates two more directories,
347.Pa /dev/vinum/vol
348and
349.Pa /dev/vinum/drive ,
350in which it stores hierarchical information for volumes and drives.
351.It
352In addition,
353.Nm
354creates three super-devices,
355.Pa /dev/vinum/control ,
356.Pa /dev/vinum/Control
357and
358.Pa /dev/vinum/controld .
359.Pa /dev/vinum/control
360is used by
361.Xr vinum 8
362when it has been compiled without the VINUMDEBUG option,
363.Pa /dev/vinum/Control
364is used by
365.Xr vinum 8
366when it has been compiled with the VINUMDEBUG option,
367and
368.Pa /dev/vinum/controld
369is used by the
370.Nm
371daemon.  The two control devices for
372.Xr vinum 8
373are used to synchronize the debug status of kernel and user modules.
374.It
375Unlike
376.Nm UNIX
377drives,
378.Nm
379volumes are not subdivided into partitions, and thus do not contain a disk
380label.  Unfortunately, this confuses a number of utilities, notably
381.Nm newfs ,
382which normally tries to interpret the last letter of a
383.Nm
384volume name as a partition identifier.  If you use a volume name which does not
385end in the letters
386.Ar a
387to
388.Ar c ,
389you must use the
390.Fl v
391flag to
392.Nm newfs
393in order to tell it to ignore this convention.
394.\"
395.It
396Plexes do not need to be assigned explicit names.  By default, a plex name is
397the name of the volume followed by the letters \f(CW.p\fR and the number of the
398plex.  For example, the plexes of volume
399.Ar vol3
400are called
401.Ar vol3.p0 ,
402.Ar vol3.p1
403and so on.  These names can be overridden, but it is not recommended.
404.br
405.It
406Like plexes, subdisks are assigned names automatically, and explicit naming is
407discouraged.  A subdisk name is the name of the plex followed by the letters
408\f(CW\&.s\fR and a number identifying the subdisk.  For example, the subdisks of
409plex
410.Ar vol3.p0
411are called
412.Ar vol3.p0.s0 ,
413.Ar vol3.p0.s1
414and so on.
415.br
416.It
417By contrast,
418.Nm drives
419must be named.  This makes it possible to move a drive to a different location
420and still recognize it automatically.  Drive names may be up to 32 characters
421long.
422.El
423.Pp
424EXAMPLE
425.Pp
426Assume the
427.Nm
428objects described in the section CONFIGURATION FILE in
429.Xr vinum 8 .
430The directory
431.Ar /dev/vinum
432looks like:
433.Bd -literal -offset indent
434# ls -lR /dev/vinum
435total 5
436crwxr-xr--  1 root  wheel   91,   2 Mar 30 16:08 concat
437crwx------  1 root  wheel   91, 0x40000000 Mar 30 16:08 control
438crwx------  1 root  wheel   91, 0x40000001 Mar 30 16:08 controld
439drwxrwxrwx  2 root  wheel       512 Mar 30 16:08 drive
440drwxrwxrwx  2 root  wheel       512 Mar 30 16:08 plex
441drwxrwxrwx  2 root  wheel       512 Mar 30 16:08 rvol
442drwxrwxrwx  2 root  wheel       512 Mar 30 16:08 sd
443crwxr-xr--  1 root  wheel   91,   3 Mar 30 16:08 strcon
444crwxr-xr--  1 root  wheel   91,   1 Mar 30 16:08 stripe
445crwxr-xr--  1 root  wheel   91,   0 Mar 30 16:08 tinyvol
446drwxrwxrwx  7 root  wheel       512 Mar 30 16:08 vol
447crwxr-xr--  1 root  wheel   91,   4 Mar 30 16:08 vol5
448
449/dev/vinum/drive:
450total 0
451crw-r-----  1 root  operator    4,  15 Oct 21 16:51 drive2
452crw-r-----  1 root  operator    4,  31 Oct 21 16:51 drive4
453
454/dev/vinum/plex:
455total 0
456crwxr-xr--  1 root  wheel   91, 0x10000002 Mar 30 16:08 concat.p0
457crwxr-xr--  1 root  wheel   91, 0x10010002 Mar 30 16:08 concat.p1
458crwxr-xr--  1 root  wheel   91, 0x10000003 Mar 30 16:08 strcon.p0
459crwxr-xr--  1 root  wheel   91, 0x10010003 Mar 30 16:08 strcon.p1
460crwxr-xr--  1 root  wheel   91, 0x10000001 Mar 30 16:08 stripe.p0
461crwxr-xr--  1 root  wheel   91, 0x10000000 Mar 30 16:08 tinyvol.p0
462crwxr-xr--  1 root  wheel   91, 0x10000004 Mar 30 16:08 vol5.p0
463crwxr-xr--  1 root  wheel   91, 0x10010004 Mar 30 16:08 vol5.p1
464
465/dev/vinum/sd:
466total 0
467crwxr-xr--  1 root  wheel   91, 0x20000002 Mar 30 16:08 concat.p0.s0
468crwxr-xr--  1 root  wheel   91, 0x20100002 Mar 30 16:08 concat.p0.s1
469crwxr-xr--  1 root  wheel   91, 0x20010002 Mar 30 16:08 concat.p1.s0
470crwxr-xr--  1 root  wheel   91, 0x20000003 Mar 30 16:08 strcon.p0.s0
471crwxr-xr--  1 root  wheel   91, 0x20100003 Mar 30 16:08 strcon.p0.s1
472crwxr-xr--  1 root  wheel   91, 0x20010003 Mar 30 16:08 strcon.p1.s0
473crwxr-xr--  1 root  wheel   91, 0x20110003 Mar 30 16:08 strcon.p1.s1
474crwxr-xr--  1 root  wheel   91, 0x20000001 Mar 30 16:08 stripe.p0.s0
475crwxr-xr--  1 root  wheel   91, 0x20100001 Mar 30 16:08 stripe.p0.s1
476crwxr-xr--  1 root  wheel   91, 0x20000000 Mar 30 16:08 tinyvol.p0.s0
477crwxr-xr--  1 root  wheel   91, 0x20100000 Mar 30 16:08 tinyvol.p0.s1
478crwxr-xr--  1 root  wheel   91, 0x20000004 Mar 30 16:08 vol5.p0.s0
479crwxr-xr--  1 root  wheel   91, 0x20100004 Mar 30 16:08 vol5.p0.s1
480crwxr-xr--  1 root  wheel   91, 0x20010004 Mar 30 16:08 vol5.p1.s0
481crwxr-xr--  1 root  wheel   91, 0x20110004 Mar 30 16:08 vol5.p1.s1
482
483/dev/vinum/vol:
484total 5
485crwxr-xr--  1 root  wheel   91,   2 Mar 30 16:08 concat
486drwxr-xr-x  4 root  wheel       512 Mar 30 16:08 concat.plex
487crwxr-xr--  1 root  wheel   91,   3 Mar 30 16:08 strcon
488drwxr-xr-x  4 root  wheel       512 Mar 30 16:08 strcon.plex
489crwxr-xr--  1 root  wheel   91,   1 Mar 30 16:08 stripe
490drwxr-xr-x  3 root  wheel       512 Mar 30 16:08 stripe.plex
491crwxr-xr--  1 root  wheel   91,   0 Mar 30 16:08 tinyvol
492drwxr-xr-x  3 root  wheel       512 Mar 30 16:08 tinyvol.plex
493crwxr-xr--  1 root  wheel   91,   4 Mar 30 16:08 vol5
494drwxr-xr-x  4 root  wheel       512 Mar 30 16:08 vol5.plex
495
496/dev/vinum/vol/concat.plex:
497total 2
498crwxr-xr--  1 root  wheel   91, 0x10000002 Mar 30 16:08 concat.p0
499drwxr-xr-x  2 root  wheel       512 Mar 30 16:08 concat.p0.sd
500crwxr-xr--  1 root  wheel   91, 0x10010002 Mar 30 16:08 concat.p1
501drwxr-xr-x  2 root  wheel       512 Mar 30 16:08 concat.p1.sd
502
503/dev/vinum/vol/concat.plex/concat.p0.sd:
504total 0
505crwxr-xr--  1 root  wheel   91, 0x20000002 Mar 30 16:08 concat.p0.s0
506crwxr-xr--  1 root  wheel   91, 0x20100002 Mar 30 16:08 concat.p0.s1
507
508/dev/vinum/vol/concat.plex/concat.p1.sd:
509total 0
510crwxr-xr--  1 root  wheel   91, 0x20010002 Mar 30 16:08 concat.p1.s0
511
512/dev/vinum/vol/strcon.plex:
513total 2
514crwxr-xr--  1 root  wheel   91, 0x10000003 Mar 30 16:08 strcon.p0
515drwxr-xr-x  2 root  wheel       512 Mar 30 16:08 strcon.p0.sd
516crwxr-xr--  1 root  wheel   91, 0x10010003 Mar 30 16:08 strcon.p1
517drwxr-xr-x  2 root  wheel       512 Mar 30 16:08 strcon.p1.sd
518
519/dev/vinum/vol/strcon.plex/strcon.p0.sd:
520total 0
521crwxr-xr--  1 root  wheel   91, 0x20000003 Mar 30 16:08 strcon.p0.s0
522crwxr-xr--  1 root  wheel   91, 0x20100003 Mar 30 16:08 strcon.p0.s1
523
524/dev/vinum/vol/strcon.plex/strcon.p1.sd:
525total 0
526crwxr-xr--  1 root  wheel   91, 0x20010003 Mar 30 16:08 strcon.p1.s0
527crwxr-xr--  1 root  wheel   91, 0x20110003 Mar 30 16:08 strcon.p1.s1
528
529/dev/vinum/vol/stripe.plex:
530total 1
531crwxr-xr--  1 root  wheel   91, 0x10000001 Mar 30 16:08 stripe.p0
532drwxr-xr-x  2 root  wheel       512 Mar 30 16:08 stripe.p0.sd
533
534/dev/vinum/vol/stripe.plex/stripe.p0.sd:
535total 0
536crwxr-xr--  1 root  wheel   91, 0x20000001 Mar 30 16:08 stripe.p0.s0
537crwxr-xr--  1 root  wheel   91, 0x20100001 Mar 30 16:08 stripe.p0.s1
538
539/dev/vinum/vol/tinyvol.plex:
540total 1
541crwxr-xr--  1 root  wheel   91, 0x10000000 Mar 30 16:08 tinyvol.p0
542drwxr-xr-x  2 root  wheel       512 Mar 30 16:08 tinyvol.p0.sd
543
544/dev/vinum/vol/tinyvol.plex/tinyvol.p0.sd:
545total 0
546crwxr-xr--  1 root  wheel   91, 0x20000000 Mar 30 16:08 tinyvol.p0.s0
547crwxr-xr--  1 root  wheel   91, 0x20100000 Mar 30 16:08 tinyvol.p0.s1
548
549/dev/vinum/vol/vol5.plex:
550total 2
551crwxr-xr--  1 root  wheel   91, 0x10000004 Mar 30 16:08 vol5.p0
552drwxr-xr-x  2 root  wheel       512 Mar 30 16:08 vol5.p0.sd
553crwxr-xr--  1 root  wheel   91, 0x10010004 Mar 30 16:08 vol5.p1
554drwxr-xr-x  2 root  wheel       512 Mar 30 16:08 vol5.p1.sd
555
556/dev/vinum/vol/vol5.plex/vol5.p0.sd:
557total 0
558crwxr-xr--  1 root  wheel   91, 0x20000004 Mar 30 16:08 vol5.p0.s0
559crwxr-xr--  1 root  wheel   91, 0x20100004 Mar 30 16:08 vol5.p0.s1
560
561/dev/vinum/vol/vol5.plex/vol5.p1.sd:
562total 0
563crwxr-xr--  1 root  wheel   91, 0x20010004 Mar 30 16:08 vol5.p1.s0
564crwxr-xr--  1 root  wheel   91, 0x20110004 Mar 30 16:08 vol5.p1.s1
565.Ed
566.Pp
567In the case of unattached plexes and subdisks, the naming is reversed.  Subdisks
568are named after the disk on which they are located, and plexes are named after
569the subdisk.
570.\" XXX
571.Nm This mapping is still to be determined.
572.Ss OBJECT STATES
573.Pp
574Each
575.Nm
576object has a \fIstate\fR associated with it.
577.Nm
578uses this state to determine the handling of the object.
579.Pp
580.Ss VOLUME STATES
581Volumes may have the following states:
582.sp
583.Bl -hang -width 14n
584.It Li down
585The volume is completely inaccessible.
586.It Li up
587The volume is up and at least partially functional.  Not all plexes may be
588available.
589.El
590.Ss "PLEX STATES"
591Plexes may have the following states:
592.sp
593.ne 1i
594.Bl -hang -width 14n
595.It Li referenced
596A plex entry which has been referenced as part of a volume, but which is
597currently not known.
598.It Li faulty
599A plex which has gone completely down because of I/O errors.
600.It Li down
601A plex which has been taken down by the administrator.
602.It Li initializing
603A plex which is being initialized.
604.sp
605The remaining states represent plexes which are at least partially up.
606.It Li corrupt
607A plex entry which is at least partially up.  Not all subdisks are available,
608and an inconsistency has occurred.  If no other plex is uncorrupted, the volume
609is no longer consistent.
610.It Li degraded
611A RAID-5 plex entry which is accessible, but one subdisk is down, requiring
612recovery for many I/O requests.
613.It Li flaky
614A plex which is really up, but which has a reborn subdisk which we don't
615completely trust, and which we don't want to read if we can avoid it.
616.It Li up
617A plex entry which is completely up.  All subdisks are up.
618.El
619.sp 2v
620.Ss "SUBDISK STATES"
621Subdisks can have the following states:
622.sp
623.ne 1i
624.Bl -hang -width 14n
625.It Li empty
626A subdisk entry which has been created completely.  All fields are correct, and
627the disk has been updated, but the on the disk is not valid.
628.It Li referenced
629A subdisk entry which has been referenced as part of a plex, but which is
630currently not known.
631.It Li initializing
632A subdisk entry which has been created completely and which is currently being
633initialized.
634.sp
635The following states represent invalid data.
636.It Li obsolete
637A subdisk entry which has been created completely.  All fields are correct, the
638config on disk has been updated, and the data was valid, but since then the
639drive has been taken down, and as a result updates have been missed.
640.It Li stale
641A subdisk entry which has been created completely.  All fields are correct, the
642disk has been updated, and the data was valid, but since then the drive has been
643crashed and updates have been lost.
644.sp
645The following states represent valid, inaccessible data.
646.It Li crashed
647A subdisk entry which has been created completely.  All fields are correct, the
648disk has been updated, and the data was valid, but since then the drive has gone
649down.  No attempt has been made to write to the subdisk since the crash, so the
650data is valid.
651.It Li down
652A subdisk entry which was up, which contained valid data, and which was taken
653down by the administrator.  The data is valid.
654.It Li reviving
655The subdisk is currently in the process of being revived.  We can write but not
656read.
657.sp
658The following states represent accessible subdisks with valid data.
659.It Li reborn
660A subdisk entry which has been created completely.  All fields are correct, the
661disk has been updated, and the data was valid, but since then the drive has gone
662down and up again.  No updates were lost, but it is possible that the subdisk
663has been damaged.  We won't read from this subdisk if we have a choice.  If this
664is the only subdisk which covers this address space in the plex, we set its
665state to up under these circumstances, so this status implies that there is
666another subdisk to fulfil the request.
667.It Li up
668A subdisk entry which has been created completely.  All fields are correct, the
669disk has been updated, and the data is valid.
670.El
671.sp 2v
672.Ss "DRIVE STATES"
673Drives can have the following states:
674.sp
675.ne 1i
676.Bl -hang -width 14n
677.It Li referenced
678At least one subdisk refers to the drive, but it is not currently accessible to
679the system.  No device name is known.
680.It Li down
681The drive is not accessible.
682.It Li up
683The drive is up and running.
684.El
685.sp 2v
686.Sh BUGS
687.Bl -enum
688.It
689.Nm
690is a new product.  Bugs can be expected.  The configuration mechanism is not yet
691fully functional.  If you have difficulties, please look at the section
692DEBUGGING PROBLEMS WITH VINUM before reporting problems.
693.It
694Kernels with the
695.Nm
696pseudo-device appear to work, but are not supported.  If you have trouble with
697this configuration, please first replace the kernel with a non-Vinum
698kernel and test with the kld module.
699.It
700Detection of differences between the version of the kernel and the kld is not
701yet implemented.
702.It
703The RAID-5 functionality is new in
704.Fx 3.3 .
705Some problems have been
706reported with
707.Nm
708in combination with soft updates, but these are not reproducible on all
709systems.  If you are planning to use
710.Nm
711in a production environment, please test carefully.
712.El
713.Sh DEBUGGING PROBLEMS WITH VINUM
714Solving problems with
715.Nm
716can be a difficult affair.  This section suggests some approaches.
717.Ss Configuration problems
718.Pp
719It is relatively easy (too easy) to run into problems with the
720.Nm
721configuration.  If you do, the first thing you should do is stop configuration
722updates:
723.if t .ps -3
724.if t .vs -3
725.Bd -literal
726# \fBvinum setdaemon 4\fP
727.Ed
728.if t .vs
729.if t .ps
730.Pp
731This will stop updates and any further corruption of the on-disk configuration.
732.Pp
733Next, look at the on-disk configuration with the
734.Nm vinum dumpconfig
735command, for example:
736.if t .ps -3
737.if t .vs -3
738.Bd -literal
739# \fBvinum dumpconfig\fP
740Drive 4:        Device /dev/da3h
741                Created on crash.lemis.com at Sat May 20 16:32:44 2000
742                Config last updated Sat May 20 16:32:56 2000
743                Size:        601052160 bytes (573 MB)
744volume obj state up
745volume src state up
746volume raid state down
747volume r state down
748volume foo state up
749plex name obj.p0 state corrupt org concat vol obj
750plex name obj.p1 state corrupt org striped 128b vol obj
751plex name src.p0 state corrupt org striped 128b vol src
752plex name src.p1 state up org concat vol src
753plex name raid.p0 state faulty org disorg vol raid
754plex name r.p0 state faulty org disorg vol r
755plex name foo.p0 state up org concat vol foo
756plex name foo.p1 state faulty org concat vol foo
757sd name obj.p0.s0 drive drive2 plex obj.p0 state reborn len 409600b driveoffset 265b plexoffset 0b
758sd name obj.p0.s1 drive drive4 plex obj.p0 state up len 409600b driveoffset 265b plexoffset 409600b
759sd name obj.p1.s0 drive drive1 plex obj.p1 state up len 204800b driveoffset 265b plexoffset 0b
760sd name obj.p1.s1 drive drive2 plex obj.p1 state reborn len 204800b driveoffset 409865b plexoffset 128b
761sd name obj.p1.s2 drive drive3 plex obj.p1 state up len 204800b driveoffset 265b plexoffset 256b
762sd name obj.p1.s3 drive drive4 plex obj.p1 state up len 204800b driveoffset 409865b plexoffset 384b
763.Ed
764.if t .vs
765.if t .ps
766.Pp
767The configuration on all disks should be the same.  If this is not the case,
768please save the output to a file and report the problem.  There is probably
769little that can be done to recover the on-disk configuration, but if you keep a
770copy of the files used to create the objects, you should be able to re-create
771them.  The
772.Cm create
773command does not change the subdisk data, so this will not cause data
774corruption.  You may need to use the
775.Cm resetconfig
776command if you have this kind of trouble.
777.Ss Kernel Panics
778.Pp
779In order to analyse a panic which you suspect comes from
780.Nm
781you will need to build a debug kernel.  See the online handbook at
782.Pa /usr/share/doc/en/books/developers-handbook/kerneldebug.html
783(if installed) or
784.Pa http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html
785for more details of how to do this.
786.Pp
787Perform the following steps to analyse a
788.Nm
789problem:
790.Bl -enum
791.It
792Copy the files
793.Pa /sys/dev/raid/vinum/.gdbinit.crash ,
794.Pa /sys/dev/raid/vinum/.gdbinit.kernel ,
795.Pa /sys/dev/raid/vinum/.gdbinit.serial ,
796.Pa /sys/dev/raid/vinum/.gdbinit.vinum
797and
798.Pa /sys/dev/raid/vinum/.gdbinit.vinum.paths
799to the directory in which you will be performing the analysis, typically
800.Pa /var/crash .
801.It
802Make sure that you build the
803.Nm
804module with debugging information.  The standard
805.Pa Makefile
806builds a module with debugging symbols by default.  If the version of
807.Nm
808in
809.Pa /modules
810does not contain symbols, you will not get an error message, but the stack trace
811will not show the symbols.  Check the module before starting
812.Nm gdb :
813.Bd -literal
814$ file /modules/vinum.ko
815/modules/vinum.ko: ELF 32-bit LSB shared object, Intel 80386,
816  version 1 (FreeBSD), not stripped
817.Ed
818.Pp
819If the output shows that
820.Pa /modules/vinum.ko
821is stripped, you will have to find a version which is not.  Usually this will be
822either in
823.Pa /usr/obj/usr/src/sys/SYSTEM_NAME/usr/src/sys/dev/raid/vinum/vinum.ko
824(if you have built
825.Nm
826with a
827.Ar make world )
828or
829.Pa /sys/dev/raid/vinum/vinum.ko
830(if you have built
831.Nm
832in this directory).  Modify the file
833.Pa .gdbinit.vinum.paths
834accordingly.
835.It
836Either take a dump or use remote serial
837.Cm gdb
838to analyse the problem.  To analyse a dump, say
839.Pa /var/crash/vmcore.5 ,
840link
841.Pa /var/crash/.gdbinit.crash
842to
843.Pa /var/crash/.gdbinit
844and enter:
845.Bd -literal
846# cd /var/crash
847# gdb -k kernel.debug vmcore.5
848.Ed
849.Pp
850This example assumes that you have installed the correct debug kernel at
851.Pa /var/crash/kernel.debug .
852If not, substitute the correct name of the debug kernel.
853.Pp
854To perform remote serial debugging,
855link
856.Pa /var/crash/.gdbinit.serial
857to
858.Pa /var/crash/.gdbinit
859and enter
860.Bd -literal
861# cd /var/crash
862# gdb -k kernel.debug
863.Ed
864.Pp
865In this case, the
866.Pa .gdbinit
867file performs the functions necessary to establish connection.  The remote
868machine must already be in debug mode: enter the kernel debugger and select
869.Nm gdb .
870The serial
871.Pa .gdbinit
872file expects the serial connection to run at 38400 bits per second; if you run
873at a different speed, edit the file accordingly (look for the
874.Ar remotebaud
875specification).
876.Pp
877The following example shows a remote debugging session using the
878.Ar debug
879command of
880.Xr vinum 8 :
881.if t .ps -3
882.if t .vs -3
883.Bd -literal
884GDB 4.16 (i386-unknown-dragonfly), Copyright 1996 Free Software Foundation, Inc.
885Debugger (msg=0xf1093174 "vinum debug") at ../../i386/i386/db_interface.c:318
886318                 in_Debugger = 0;
887#1  0xf108d9bc in vinumioctl (dev=0x40001900, cmd=0xc008464b, data=0xf6dedee0 "",
888    flag=0x3, p=0xf68b7940) at
889    /usr/src/sys/dev/raid/vinum/vinumioctl.c:102
890102             Debugger ("vinum debug");
891(kgdb) bt
892#0  Debugger (msg=0xf0f661ac "vinum debug") at ../../i386/i386/db_interface.c:318
893#1  0xf0f60a7c in vinumioctl (dev=0x40001900, cmd=0xc008464b, data=0xf6923ed0 "",
894      flag=0x3, p=0xf688e6c0) at
895      /usr/src/sys/dev/raid/vinum/vinumioctl.c:109
896#2  0xf01833b7 in spec_ioctl (ap=0xf6923e0c) at ../../miscfs/specfs/spec_vnops.c:424
897#3  0xf0182cc9 in spec_vnoperate (ap=0xf6923e0c) at ../../miscfs/specfs/spec_vnops.c:129
898#4  0xf01eb3c1 in ufs_vnoperatespec (ap=0xf6923e0c) at ../../ufs/ufs/ufs_vnops.c:2312
899#5  0xf017dbb1 in vn_ioctl (fp=0xf1007ec0, com=0xc008464b, data=0xf6923ed0 "",
900      p=0xf688e6c0) at vnode_if.h:395
901#6  0xf015dce0 in ioctl (p=0xf688e6c0, uap=0xf6923f84) at ../../kern/sys_generic.c:473
902#7  0xf0214c0b in syscall (frame={tf_es = 0x27, tf_ds = 0x27, tf_edi = 0xefbfcff8,
903      tf_esi = 0x1, tf_ebp = 0xefbfcf90, tf_isp = 0xf6923fd4, tf_ebx = 0x2,
904      tf_edx = 0x804b614, tf_ecx = 0x8085d10, tf_eax = 0x36, tf_trapno = 0x7,
905      tf_err = 0x2, tf_eip = 0x8060a34, tf_cs = 0x1f, tf_eflags = 0x286,
906      tf_esp = 0xefbfcf78, tf_ss = 0x27}) at ../../i386/i386/trap.c:1100
907#8  0xf020a1fc in Xint0x80_syscall ()
908#9  0x804832d in ?? ()
909#10 0x80482ad in ?? ()
910#11 0x80480e9 in ?? ()
911.Ed
912.if t .vs
913.if t .ps
914.Pp
915When entering from the debugger, it's important that the source of frame 1
916(listed by the
917.Pa .gdbinit
918file at the top of the example) contains the text
919.if t .ps -3
920.if t .vs -3
921.Bd -literal
922Debugger ("vinum debug");
923.Ed
924.if t .vs
925.if t .ps
926.Pp
927This is an indication that the address specifications are correct.  If you get
928some other output, your symbols and the kernel module are out of sync, and the
929trace will be meaningless.
930.El
931.Pp
932For an initial investigation, the most important information is the output of
933the
934.Nm bt
935(backtrace) command above.
936.Ss Reporting problems with Vinum
937.Pp
938If you find any bugs in
939.Nm ,
940please report them to Greg Lehey <grog@lemis.com>.  Supply the following
941information:
942.Pp
943.Bl -bullet
944.It
945The output of the
946.Nm
947.Cm list
948command.
949.It
950Any messages printed in
951.Pa /var/log/messages .
952All such messages will be identified by the text
953.Nm
954at the beginning.
955.It
956If you have a panic, a stack trace as described above.
957.El
958.Sh AUTHORS
959.An Greg Lehey Aq grog@lemis.com .
960.Sh HISTORY
961.Nm
962first appeared in
963.Fx 3.0 .
964The RAID-5 component of
965.Nm
966was developed by Cybernet Inc.
967.Pa www.cybernet.com
968for its NetMAX product.
969.Sh SEE ALSO
970.Xr disklabel 5 ,
971.Xr disklabel 8 ,
972.Xr newfs 8 ,
973.Xr vinum 8
974