1.\" Hey, Emacs, edit this file in -*- nroff-fill -*- mode 2.\"- 3.\" Copyright (c) 1997, 1998 4.\" Nan Yang Computer Services Limited. All rights reserved. 5.\" 6.\" This software is distributed under the so-called ``Berkeley 7.\" License'': 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 3. Neither the name of the Company nor the names of its contributors 18.\" may be used to endorse or promote products derived from this software 19.\" without specific prior written permission. 20.\" 21.\" This software is provided ``as is'', and any express or implied 22.\" warranties, including, but not limited to, the implied warranties of 23.\" merchantability and fitness for a particular purpose are disclaimed. 24.\" In no event shall the company or contributors be liable for any 25.\" direct, indirect, incidental, special, exemplary, or consequential 26.\" damages (including, but not limited to, procurement of substitute 27.\" goods or services; loss of use, data, or profits; or business 28.\" interruption) however caused and on any theory of liability, whether 29.\" in contract, strict liability, or tort (including negligence or 30.\" otherwise) arising in any way out of the use of this software, even if 31.\" advised of the possibility of such damage. 32.\" 33.\" $Id: vinum.8,v 1.48 2001/01/15 22:15:05 grog Exp $ 34.\" $FreeBSD: src/sbin/vinum/vinum.8,v 1.33.2.10 2002/12/29 16:35:38 schweikh Exp $ 35.\" $DragonFly: src/sbin/vinum/vinum.8,v 1.8 2007/11/04 16:34:55 swildner Exp $ 36.\" 37.Dd August 11, 2007 38.Dt VINUM 8 39.Os 40.Sh NAME 41.Nm vinum 42.Nd Logical Volume Manager control program 43.Sh SYNOPSIS 44.Nm 45.Op Ar command 46.Op Fl options 47.Sh COMMANDS 48.Bl -tag -width indent 49.It Ic attach Ar plex volume Op Cm rename 50.It Xo 51.Ic attach Ar subdisk plex 52.Op Ar offset 53.Op Cm rename 54.Xc 55Attach a plex to a volume, or a subdisk to a plex. 56.It Xo 57.Ic checkparity Ar plex 58.Op Fl f 59.Op Fl v 60.Xc 61Check the parity blocks of a RAID-4 or RAID-5 plex. 62.It Xo 63.Ic concat 64.Op Fl f 65.Op Fl n Ar name 66.Op Fl v 67.Ar drives 68.Xc 69Create a concatenated volume from the specified drives. 70.It Xo 71.Ic create 72.Op Fl f 73.Ar description-file 74.Xc 75Create a volume as described in 76.Ar description-file . 77.It Ic debug 78Cause the volume manager to enter the kernel debugger. 79.It Ic debug Ar flags 80Set debugging flags. 81.It Xo 82.Ic detach 83.Op Fl f 84.Op Ar plex | subdisk 85.Xc 86Detach a plex or subdisk from the volume or plex to which it is attached. 87.It Ic dumpconfig Op Ar drive ... 88List the configuration information stored on the specified drives, or all drives 89in the system if no drive names are specified. 90.It Xo 91.Ic info 92.Op Fl v 93.Op Fl V 94.Xc 95List information about volume manager state. 96.It Xo 97.Ic init 98.Op Fl S Ar size 99.Op Fl w 100.Ar plex | subdisk 101.Xc 102.\" XXX 103Initialize the contents of a subdisk or all the subdisks of a plex to all zeros. 104.It Ic label Ar volume 105Create a volume label. 106.It Xo 107.Ic l | list 108.Op Fl r 109.Op Fl s 110.Op Fl v 111.Op Fl V 112.Op Ar volume | plex | subdisk 113.Xc 114List information about specified objects. 115.It Xo 116.Ic ld 117.Op Fl r 118.Op Fl s 119.Op Fl v 120.Op Fl V 121.Op Ar volume 122.Xc 123List information about drives. 124.It Xo 125.Ic ls 126.Op Fl r 127.Op Fl s 128.Op Fl v 129.Op Fl V 130.Op Ar subdisk 131.Xc 132List information about subdisks. 133.It Xo 134.Ic lp 135.Op Fl r 136.Op Fl s 137.Op Fl v 138.Op Fl V 139.Op Ar plex 140.Xc 141List information about plexes. 142.It Xo 143.Ic lv 144.Op Fl r 145.Op Fl s 146.Op Fl v 147.Op Fl V 148.Op Ar volume 149.Xc 150List information about volumes. 151.It Ic makedev 152Remake the device nodes in 153.Pa /dev/vinum . 154.It Xo 155.Ic mirror 156.Op Fl f 157.Op Fl n Ar name 158.Op Fl s 159.Op Fl v 160.Ar drives 161.Xc 162Create a mirrored volume from the specified drives. 163.It Xo 164.Ic move | mv 165.Fl f 166.Ar drive object ... 167.Xc 168Move the object(s) to the specified drive. 169.It Ic printconfig Op Ar file 170Write a copy of the current configuration to 171.Ar file . 172.It Ic quit 173Exit the 174.Nm 175program when running in interactive mode. Normally this would be done by 176entering the 177.Dv EOF 178character. 179.It Ic read Ar disk ... 180Read the 181.Nm 182configuration from the specified disks. 183.It Xo 184.Ic rename Op Fl r 185.Op Ar drive | subdisk | plex | volume 186.Ar newname 187.Xc 188Change the name of the specified object. 189.\" XXX 190.\".It Ic replace Ar drive newdrive 191.\"Move all the subdisks from the specified drive onto the new drive. 192.It Xo 193.Ic rebuildparity Ar plex Op Fl f 194.Op Fl v 195.Op Fl V 196.Xc 197Rebuild the parity blocks of a RAID-4 or RAID-5 plex. 198.It Ic resetconfig 199Reset the complete 200.Nm 201configuration. 202.It Xo 203.Ic resetstats 204.Op Fl r 205.Op Ar volume | plex | subdisk 206.Xc 207Reset statistics counters for the specified objects, or for all objects if none 208are specified. 209.It Xo 210.Ic rm 211.Op Fl f 212.Op Fl r 213.Ar volume | plex | subdisk 214.Xc 215Remove an object. 216.It Ic saveconfig 217Save 218.Nm 219configuration to disk after configuration failures. 220.\" XXX 221.\".It Xo 222.\".Ic set 223.\".Op Fl f 224.\".Ar state 225.\".Ar volume | plex | subdisk | disk 226.\".Xc 227.\"Set the state of the object to 228.\".Ar state . 229.It Ic setdaemon Op Ar value 230Set daemon configuration. 231.It Xo 232.Ic setstate 233.Ar state 234.Op Ar volume | plex | subdisk | drive 235.Xc 236Set state without influencing other objects, for diagnostic purposes only. 237.It Ic start 238Read configuration from all vinum drives. 239.It Xo 240.Ic start 241.Op Fl i Ar interval 242.Op Fl S Ar size 243.Op Fl w 244.Ar volume | plex | subdisk 245.Xc 246Allow the system to access the objects. 247.It Xo 248.Ic stop 249.Op Fl f 250.Op Ar volume | plex | subdisk 251.Xc 252Terminate access to the objects, or stop 253.Nm 254if no parameters are specified. 255.It Xo 256.Ic stripe 257.Op Fl f 258.Op Fl n Ar name 259.Op Fl v 260.Ar drives 261.Xc 262Create a striped volume from the specified drives. 263.El 264.Sh DESCRIPTION 265.Nm 266is a utility program to communicate with the 267.Xr vinum 4 268logical volume 269manager. 270.Nm 271is designed either for interactive use, when started without command line 272arguments, or to execute a single command if the command is supplied on the 273command line. In interactive mode, 274.Nm 275maintains a command line history. 276.Sh OPTIONS 277.Nm 278commands may optionally be followed by an option. Any of the following options 279may be specified with any command, but in some cases the options are ignored. 280For example, the 281.Ic stop 282command ignores the 283.Fl v 284and 285.Fl V 286options. 287.Bl -tag -width indent 288.It Fl f 289The 290.Fl f 291.Pq Dq force 292option overrides safety checks. Use with extreme care. This option is for 293emergency use only. For example, the command 294.Pp 295.Dl rm -f myvolume 296.Pp 297removes 298.Ar myvolume 299even if it is open. Any subsequent access to the volume will almost certainly 300cause a panic. 301.It Fl i Ar millisecs 302When performing the 303.Ic init 304and 305.Ic start 306commands, wait 307.Ar millisecs 308milliseconds between copying each block. This lowers the load on the system. 309.It Fl n Ar name 310Use the 311.Fl n 312option to specify a volume name to the simplified configuration commands 313.Ic concat , mirror 314and 315.Ic stripe . 316.It Fl r 317The 318.Fl r 319.Pq Dq recursive 320option is used by the list commands to display information not 321only about the specified objects, but also about subordinate objects. For 322example, in conjunction with the 323.Ic lv 324command, the 325.Fl r 326option will also show information about the plexes and subdisks belonging to the 327volume. 328.It Fl s 329The 330.Fl s 331.Pq Dq statistics 332option is used by the list commands to display statistical information. The 333.Ic mirror 334command also uses this option to specify that it should create striped plexes. 335.It Fl S Ar size 336The 337.Fl S 338option specifies the transfer size for the 339.Ic init 340and 341.Ic start 342commands. 343.It Fl v 344The 345.Fl v 346.Pq Dq verbose 347option can be used to request more detailed information. 348.It Fl V 349The 350.Fl V 351.Pq Dq Very verbose 352option can be used to request more detailed information than the 353.Fl v 354option provides. 355.It Fl w 356The 357.Fl w 358.Pq Dq wait 359option tells 360.Nm 361to wait for completion of commands which normally run in the background, such as 362.Ic init . 363.El 364.Sh COMMANDS IN DETAIL 365.Nm 366commands perform the following functions: 367.Pp 368.Bl -tag -width indent -compact 369.It Ic attach Ar plex volume Op Cm rename 370.It Xo 371.Ic attach Ar subdisk plex 372.Op Ar offset 373.Op Cm rename 374.Xc 375.Nm Ic attach 376inserts the specified plex or subdisk in a volume or plex. In the case of a 377subdisk, an offset in the plex may be specified. If it is not, the subdisk will 378be attached at the first possible location. After attaching a plex to a 379non-empty volume, 380.Nm 381reintegrates the plex. 382.Pp 383If the keyword 384.Cm rename 385is specified, 386.Nm 387renames the object (and in the case of a plex, any subordinate subdisks) to fit 388in with the default 389.Nm 390naming convention. To rename the object to any other name, use the 391.Ic rename 392command. 393.Pp 394A number of considerations apply to attaching subdisks: 395.Bl -bullet 396.It 397Subdisks can normally only be attached to concatenated plexes. 398.It 399If a striped or RAID-5 plex is missing a subdisk (for example after drive 400failure), it should be replaced by a subdisk of the same size only. 401.It 402In order to add further subdisks to a striped or RAID-5 plex, use the 403.Fl f 404(force) option. This will corrupt the data in the plex. 405.\"No other attachment of 406.\"subdisks is currently allowed for striped and RAID-5 plexes. 407.It 408For concatenated plexes, the 409.Ar offset 410parameter specifies the offset in blocks from the beginning of the plex. For 411striped and RAID-5 plexes, it specifies the offset of the first block of the 412subdisk: in other words, the offset is the numerical position of the subdisk 413multiplied by the stripe size. For example, in a plex with stripe size 271k, 414the first subdisk will have offset 0, the second offset 271k, the third 542k, 415etc. This calculation ignores parity blocks in RAID-5 plexes. 416.El 417.Pp 418.It Xo 419.Ic checkparity 420.Ar plex 421.Op Fl f 422.Op Fl v 423.Xc 424Check the parity blocks on the specified RAID-4 or RAID-5 plex. This operation 425maintains a pointer in the plex, so it can be stopped and later restarted from 426the same position if desired. In addition, this pointer is used by the 427.Ic rebuildparity 428command, so rebuilding the parity blocks need only start at the location where 429the first parity problem has been detected. 430.Pp 431If the 432.Fl f 433flag is specified, 434.Ic checkparity 435starts checking at the beginning of the plex. If the 436.Fl v 437flag is specified, 438.Ic checkparity 439prints a running progress report. 440.Pp 441.It Xo 442.Ic concat 443.Op Fl f 444.Op Fl n Ar name 445.Op Fl v 446.Ar drives 447.Xc 448The 449.Ic concat 450command provides a simplified alternative to the 451.Ic create 452command for creating volumes with a single concatenated plex. The largest 453contiguous space available on each drive is used to create the subdisks for the 454plexes. 455.Pp 456Normally, the 457.Ic concat 458command creates an arbitrary name for the volume and its components. The name 459is composed of the text 460.Dq Li vinum 461and a small integer, for example 462.Dq Li vinum3 . 463You can override this with the 464.Fl n Ar name 465option, which assigns the name specified to the volume. The plexes and subdisks 466are named after the volume in the default manner. 467.Pp 468There is no choice of name for the drives. If the drives have already been 469initialized as 470.Nm 471drives, the name remains. Otherwise the drives are given names starting with 472the text 473.Dq Li vinumdrive 474and a small integer, for example 475.Dq Li vinumdrive7 . 476As with the 477.Ic create 478command, the 479.Fl f 480option can be used to specify that a previous name should be overwritten. The 481.Fl v 482is used to specify verbose output. 483.Pp 484See the section 485.Sx SIMPLIFIED CONFIGURATION 486below for some examples of this 487command. 488.Pp 489.It Xo 490.Ic create 491.Op Fl f 492.Ar description-file 493.Xc 494.Nm Ic create 495is used to create any object. In view of the relatively complicated 496relationship and the potential dangers involved in creating a 497.Nm 498object, there is no interactive interface to this function. If you do not 499specify a file name, 500.Nm 501starts an editor on a temporary file. If the environment variable 502.Ev EDITOR 503is set, 504.Nm 505starts this editor. If not, it defaults to 506.Nm vi . 507See the section 508.Sx CONFIGURATION FILE 509below for more information on the format of 510this file. 511.Pp 512Note that the 513.Nm Ic create 514function is additive: if you run it multiple times, you will create multiple 515copies of all unnamed objects. 516.Pp 517Normally the 518.Ic create 519command will not change the names of existing 520.Nm 521drives, in order to avoid accidentally erasing them. The correct way to dispose 522of no longer wanted 523.Nm 524drives is to reset the configuration with the 525.Ic resetconfig 526command. In some cases, however, it may be necessary to create new data on 527.Nm 528drives which can no longer be started. In this case, use the 529.Ic create Fl f 530command. 531.Pp 532.It Ic debug 533.Nm Ic debug , 534without any arguments, is used to enter the remote kernel debugger. It is only 535activated if 536.Nm 537is built with the 538.Dv VINUMDEBUG 539option. This option will stop the execution of the operating system until the 540kernel debugger is exited. If remote debugging is set and there is no remote 541connection for a kernel debugger, it will be necessary to reset the system and 542reboot in order to leave the debugger. 543.Pp 544.It Ic debug Ar flags 545Set a bit mask of internal debugging flags. These will change without warning 546as the product matures; to be certain, read the header file 547.Pa /sys/dev/raid/vinum/vinumvar.h . 548The bit mask is composed of the following values: 549.Bl -tag -width indent 550.It Dv DEBUG_ADDRESSES Pq No 1 551Show buffer information during requests 552.\".It Dv DEBUG_NUMOUTPUT Pq No 2 553.\"Show the value of 554.\".Va vp->v_numoutput . 555.It Dv DEBUG_RESID Pq No 4 556Go into debugger in 557.Fn complete_rqe . 558.It Dv DEBUG_LASTREQS Pq No 8 559Keep a circular buffer of last requests. 560.It Dv DEBUG_REVIVECONFLICT Pq No 16 561Print info about revive conflicts. 562.It Dv DEBUG_EOFINFO Pq No 32 563Print information about internal state when returning an 564.Dv EOF 565on a striped plex. 566.It Dv DEBUG_MEMFREE Pq No 64 567Maintain a circular list of the last memory areas freed by the memory allocator. 568.It Dv DEBUG_REMOTEGDB Pq No 256 569Go into remote 570.Nm gdb 571when the 572.Ic debug 573command is issued. 574.It Dv DEBUG_WARNINGS Pq No 512 575Print some warnings about minor problems in the implementation. 576.El 577.Pp 578.It Ic detach Oo Fl f Oc Ar plex 579.It Ic detach Oo Fl f Oc Ar subdisk 580.Nm Ic detach 581removes the specified plex or subdisk from the volume or plex to which it is 582attached. If removing the object would impair the data integrity of the volume, 583the operation will fail unless the 584.Fl f 585option is specified. If the object is named after the object above it (for 586example, subdisk 587.Li vol1.p7.s0 588attached to plex 589.Li vol1.p7 ) , 590the name will be changed 591by prepending the text 592.Dq Li ex- 593(for example, 594.Li ex-vol1.p7.s0 ) . 595If necessary, the name will be truncated in the 596process. 597.Pp 598.Ic detach 599does not reduce the number of subdisks in a striped or RAID-5 plex. Instead, 600the subdisk is marked absent, and can later be replaced with the 601.Ic attach 602command. 603.Pp 604.It Ic dumpconfig Op Ar drive ... 605.Pp 606.Nm Ic dumpconfig 607shows the configuration information stored on the specified drives. If no drive 608names are specified, 609.Ic dumpconfig 610searches all drives on the system for Vinum partitions and dumps the 611information. If configuration updates are disabled, it is possible that this 612information is not the same as the information returned by the 613.Ic list 614command. This command is used primarily for maintenance and debugging. 615.Pp 616.It Ic info 617.Nm Ic info 618displays information about 619.Nm 620memory usage. This is intended primarily for debugging. With the 621.Fl v 622option, it will give detailed information about the memory areas in use. 623.Pp 624With the 625.Fl V 626option, 627.Ic info 628displays information about the last up to 64 I/O requests handled by the 629.Nm 630driver. This information is only collected if debug flag 8 is set. The format 631looks like: 632.Bd -literal 633vinum -> info -V 634Flags: 0x200 1 opens 635Total of 38 blocks malloced, total memory: 16460 636Maximum allocs: 56, malloc table at 0xf0f72dbc 637 638Time Event Buf Dev Offset Bytes SD SDoff Doffset Goffset 639 64014:40:00.637758 1VS Write 0xf2361f40 91.3 0x10 16384 64114:40:00.639280 2LR Write 0xf2361f40 91.3 0x10 16384 64214:40:00.639294 3RQ Read 0xf2361f40 4.39 0x104109 8192 19 0 0 0 64314:40:00.639455 3RQ Read 0xf2361f40 4.23 0xd2109 8192 17 0 0 0 64414:40:00.639529 3RQ Read 0xf2361f40 4.15 0x6e109 8192 16 0 0 0 64514:40:00.652978 4DN Read 0xf2361f40 4.39 0x104109 8192 19 0 0 0 64614:40:00.667040 4DN Read 0xf2361f40 4.15 0x6e109 8192 16 0 0 0 64714:40:00.668556 4DN Read 0xf2361f40 4.23 0xd2109 8192 17 0 0 0 64814:40:00.669777 6RP Write 0xf2361f40 4.39 0x104109 8192 19 0 0 0 64914:40:00.685547 4DN Write 0xf2361f40 4.39 0x104109 8192 19 0 0 0 65011:11:14.975184 Lock 0xc2374210 2 0x1f8001 65111:11:15.018400 7VS Write 0xc2374210 0x7c0 32768 10 65211:11:15.018456 8LR Write 0xc2374210 13.39 0xcc0c9 32768 65311:11:15.046229 Unlock 0xc2374210 2 0x1f8001 654.Ed 655.Pp 656The 657.Ar Buf 658field always contains the address of the user buffer header. This can be used 659to identify the requests associated with a user request, though this is not 100% 660reliable: theoretically two requests in sequence could use the same buffer 661header, though this is not common. The beginning of a request can be identified 662by the event 663.Ar 1VS 664or 665.Ar 7VS . 666The first example above shows the requests involved in a user request. The 667second is a subdisk I/O request with locking. 668.Pp 669The 670.Ar Event 671field contains information related to the sequence of events in the request 672chain. The digit 673.Ar 1 674to 675.Ar 6 676indicates the approximate sequence of events, and the two-letter abbreviation is 677a mnemonic for the location: 678.Bl -tag -width Lockwait 679.It 1VS 680(vinumstrategy) shows information about the user request on entry to 681.Fn vinumstrategy . 682The device number is the 683.Nm 684device, and offset and length are the user parameters. This is always the 685beginning of a request sequence. 686.It 2LR 687(launch_requests) shows the user request just prior to launching the low-level 688.Nm 689requests in the function 690.Fn launch_requests . 691The parameters should be the same as in the 692.Ar 1VS 693information. 694.El 695.Pp 696In the following requests, 697.Ar Dev 698is the device number of the associated disk partition, 699.Ar Offset 700is the offset from the beginning of the partition, 701.Ar SD 702is the subdisk index in 703.Va vinum_conf , 704.Ar SDoff 705is the offset from the beginning of the subdisk, 706.Ar Doffset 707is the offset of the associated data request, and 708.Ar Goffset 709is the offset of the associated group request, where applicable. 710.Bl -tag -width Lockwait 711.It 3RQ 712(request) shows one of possibly several low-level 713.Nm 714requests which are launched to satisfy the high-level request. This information 715is also logged in 716.Fn launch_requests . 717.It 4DN 718(done) is called from 719.Fn complete_rqe , 720showing the completion of a request. This completion should match a request 721launched either at stage 722.Ar 4DN 723from 724.Fn launch_requests , 725or from 726.Fn complete_raid5_write 727at stage 728.Ar 5RD 729or 730.Ar 6RP . 731.It 5RD 732(RAID-5 data) is called from 733.Fn complete_raid5_write 734and represents the data written to a RAID-5 data stripe after calculating 735parity. 736.It 6RP 737(RAID-5 parity) is called from 738.Fn complete_raid5_write 739and represents the data written to a RAID-5 parity stripe after calculating 740parity. 741.It 7VS 742shows a subdisk I/O request. These requests are usually internal to 743.Nm 744for operations like initialization or rebuilding plexes. 745.It 8LR 746shows the low-level operation generated for a subdisk I/O request. 747.It Lockwait 748specifies that the process is waiting for a range lock. The parameters are the 749buffer header associated with the request, the plex number and the block number. 750For internal reasons the block number is one higher than the address of the 751beginning of the stripe. 752.It Lock 753specifies that a range lock has been obtained. The parameters are the same as 754for the range lock. 755.It Unlock 756specifies that a range lock has been released. The parameters are the same as 757for the range lock. 758.El 759.\" XXX 760.Pp 761.It Xo 762.Ic init 763.Op Fl S Ar size 764.Op Fl w 765.Ar plex | subdisk 766.Xc 767.Nm Ic init 768initializes a subdisk by writing zeroes to it. You can initialize all subdisks 769in a plex by specifying the plex name. This is the only way to ensure 770consistent data in a plex. You must perform this initialization before using a 771RAID-5 plex. It is also recommended for other new plexes. 772.Nm 773initializes all subdisks of a plex in parallel. Since this operation can take a 774long time, it is normally performed in the background. If you want to wait for 775completion of the command, use the 776.Fl w 777(wait) option. 778.Pp 779Specify the 780.Fl S 781option if you want to write blocks of a different size from the default value of 78216 kB. 783.Nm 784prints a console message when the initialization is complete. 785.Pp 786.It Ic label Ar volume 787The 788.Ic label 789command writes a 790.Xr UFS 5 791style volume label on a volume. It is a simple alternative to an appropriate 792call to 793.Ic disklabel . 794This is needed because some 795.Xr UFS 5 796commands still read the disk to find the label instead of using the correct 797.Xr ioctl 2 798call to access it. 799.Nm 800maintains a volume label separately from the volume data, so this command is not 801needed for 802.Xr newfs 8 . 803This command is deprecated. 804.Pp 805.It Xo 806.Ic list 807.Op Fl r 808.Op Fl V 809.Op Ar volume | plex | subdisk 810.Xc 811.It Xo 812.Ic l 813.Op Fl r 814.Op Fl V 815.Op Ar volume | plex | subdisk 816.Xc 817.It Xo 818.Ic ld 819.Op Fl r 820.Op Fl s 821.Op Fl v 822.Op Fl V 823.Op Ar volume 824.Xc 825.It Xo 826.Ic ls 827.Op Fl r 828.Op Fl s 829.Op Fl v 830.Op Fl V 831.Op Ar subdisk 832.Xc 833.It Xo 834.Ic lp 835.Op Fl r 836.Op Fl s 837.Op Fl v 838.Op Fl V 839.Op Ar plex 840.Xc 841.It Xo 842.Ic lv 843.Op Fl r 844.Op Fl s 845.Op Fl v 846.Op Fl V 847.Op Ar volume 848.Xc 849.Ic list 850is used to show information about the specified object. If the argument is 851omitted, information is shown about all objects known to 852.Nm . 853The 854.Ic l 855command is a synonym for 856.Ic list . 857.Pp 858The 859.Fl r 860option relates to volumes and plexes: if specified, it recursively lists 861information for the subdisks and (for a volume) plexes subordinate to the 862objects. The commands 863.Ic lv , lp , ls 864and 865.Ic ld 866list only volumes, plexes, subdisks and drives respectively. This is 867particularly useful when used without parameters. 868.Pp 869The 870.Fl s 871option causes 872.Nm 873to output device statistics, the 874.Fl v 875(verbose) option causes some additional information to be output, and the 876.Fl V 877causes considerable additional information to be output. 878.Pp 879.It Ic makedev 880The 881.Ic makedev 882command removes the directory 883.Pa /dev/vinum 884and recreates it with device nodes 885which reflect the current configuration. This command is not intended for 886general use, and is provided for emergency use only. 887.Pp 888.It Xo 889.Ic mirror 890.Op Fl f 891.Op Fl n Ar name 892.Op Fl s 893.Op Fl v 894.Ar drives 895.Xc 896The 897.Ic mirror 898command provides a simplified alternative to the 899.Ic create 900command for creating mirrored volumes. Without any options, it creates a RAID-1 901(mirrored) volume with two concatenated plexes. The largest contiguous space 902available on each drive is used to create the subdisks for the plexes. The 903first plex is built from the odd-numbered drives in the list, and the second 904plex is built from the even-numbered drives. If the drives are of different 905sizes, the plexes will be of different sizes. 906.Pp 907If the 908.Fl s 909option is provided, 910.Ic mirror 911builds striped plexes with a stripe size of 256 kB. The size of the subdisks in 912each plex is the size of the smallest contiguous storage available on any of the 913drives which form the plex. Again, the plexes may differ in size. 914.Pp 915Normally, the 916.Ic mirror 917command creates an arbitrary name for the volume and its components. The name 918is composed of the text 919.Dq Li vinum 920and a small integer, for example 921.Dq Li vinum3 . 922You can override this with the 923.Fl n Ar name 924option, which assigns the name specified to the volume. The plexes and subdisks 925are named after the volume in the default manner. 926.Pp 927There is no choice of name for the drives. If the drives have already been 928initialized as 929.Nm 930drives, the name remains. Otherwise the drives are given names starting with 931the text 932.Dq Li vinumdrive 933and a small integer, for example 934.Dq Li vinumdrive7 . 935As with the 936.Ic create 937command, the 938.Fl f 939option can be used to specify that a previous name should be overwritten. The 940.Fl v 941is used to specify verbose output. 942.Pp 943See the section 944.Sx SIMPLIFIED CONFIGURATION 945below for some examples of this 946command. 947.Pp 948.It Ic mv Fl f Ar drive object ... 949.It Ic move Fl f Ar drive object ... 950Move all the subdisks from the specified objects onto the new drive. The 951objects may be subdisks, drives or plexes. When drives or plexes are specified, 952all subdisks associated with the object are moved. 953.Pp 954The 955.Fl f 956option is required for this function, since it currently does not preserve the 957data in the subdisk. This functionality will be added at a later date. In this 958form, however, it is suited to recovering a failed disk drive. 959.Pp 960.It Ic printconfig Op Ar file 961Write a copy of the current configuration to 962.Ar file 963in a format that can be used to recreate the 964.Nm 965configuration. Unlike the configuration saved on disk, it includes definitions 966of the drives. If you omit 967.Ar file , 968.Nm 969writes the list to 970.Dv stdout . 971.Pp 972.It Ic quit 973Exit the 974.Nm 975program when running in interactive mode. Normally this would be done by 976entering the 977.Dv EOF 978character. 979.Pp 980.It Ic read Ar disk ... 981The 982.Ic read 983command scans the specified disks for 984.Nm 985partitions containing previously created configuration information. It reads 986the configuration in order from the most recently updated to least recently 987updated configuration. 988.Nm 989maintains an up-to-date copy of all configuration information on each disk 990partition. You must specify all of the slices in a configuration as the 991parameter to this command. 992.Pp 993The 994.Ic read 995command is intended to selectively load a 996.Nm 997configuration on a system which has other 998.Nm 999partitions. If you want to start all partitions on the system, it is easier to 1000use the 1001.Ic start 1002command. 1003.Pp 1004If 1005.Nm 1006encounters any errors during this command, it will turn off automatic 1007configuration update to avoid corrupting the copies on disk. This will also 1008happen if the configuration on disk indicates a configuration error (for 1009example, subdisks which do not have a valid space specification). You can turn 1010the updates on again with the 1011.Ic setdaemon 1012and 1013.Ic saveconfig 1014commands. Reset bit 2 (numerical value 4) of the daemon options mask to 1015re-enable configuration saves. 1016.Pp 1017.It Xo 1018.Ic rebuildparity 1019.Ar plex 1020.Op Fl f 1021.Op Fl v 1022.Op Fl V 1023.Xc 1024Rebuild the parity blocks on the specified RAID-4 or RAID-5 plex. This 1025operation maintains a pointer in the plex, so it can be stopped and later 1026restarted from the same position if desired. In addition, this pointer is used 1027by the 1028.Ic checkparity 1029command, so rebuilding the parity blocks need only start at the location where 1030the first parity problem has been detected. 1031.Pp 1032If the 1033.Fl f 1034flag is specified, 1035.Ic rebuildparity 1036starts rebuilding at the beginning of the plex. If the 1037.Fl v 1038flag is specified, 1039.Ic rebuildparity 1040first checks the existing parity blocks prints information about those found to 1041be incorrect before rebuilding. If the 1042.Fl V 1043flag is specified, 1044.Ic rebuildparity 1045prints a running progress report. 1046.Pp 1047.It Xo 1048.Ic rename 1049.Op Fl r 1050.Op Ar drive | subdisk | plex | volume 1051.Ar newname 1052.Xc 1053Change the name of the specified object. If the 1054.Fl r 1055option is specified, subordinate objects will be named by the default rules: 1056plex names will be formed by appending 1057.Li .p Ns Ar number 1058to the volume name, and 1059subdisk names will be formed by appending 1060.Li .s Ns Ar number 1061to the plex name. 1062.\".Pp 1063.\".It Xo 1064.\".Ic replace 1065.\".Ar drive newdrive 1066.\"Move all the subdisks from the specified drive onto the new drive. This will 1067.\"attempt to recover those subdisks that can be recovered, and create the others 1068.\"from scratch. If the new drive lacks the space for this operation, as many 1069.\"subdisks as possible will be fitted onto the drive, and the rest will be left on 1070.\"the original drive. 1071.Pp 1072.It Ic resetconfig 1073The 1074.Ic resetconfig 1075command completely obliterates the 1076.Nm 1077configuration on a system. Use this command only when you want to completely 1078delete the configuration. 1079.Nm 1080will ask for confirmation; you must type in the words 1081.Li "NO FUTURE" 1082exactly as shown: 1083.Bd -unfilled -offset indent 1084.No # Nm Ic resetconfig 1085 1086WARNING! This command will completely wipe out your vinum 1087configuration. All data will be lost. If you really want 1088to do this, enter the text 1089 1090NO FUTURE 1091.No "Enter text ->" Sy "NO FUTURE" 1092Vinum configuration obliterated 1093.Ed 1094.Pp 1095As the message suggests, this is a last-ditch command. Don't use it unless you 1096have an existing configuration which you never want to see again. 1097.Pp 1098.It Xo 1099.Ic resetstats 1100.Op Fl r 1101.Op Ar volume | plex | subdisk 1102.Xc 1103.Nm 1104maintains a number of statistical counters for each object. See the header file 1105.Pa /sys/dev/raid/vinum/vinumvar.h 1106for more information. 1107.\" XXX put it in here when it's finalized 1108Use the 1109.Ic resetstats 1110command to reset these counters. In conjunction with the 1111.Fl r 1112option, 1113.Nm 1114also resets the counters of subordinate objects. 1115.Pp 1116.It Xo 1117.Ic rm 1118.Op Fl f 1119.Op Fl r 1120.Ar volume | plex | subdisk 1121.Xc 1122.Ic rm 1123removes an object from the 1124.Nm 1125configuration. Once an object has been removed, there is no way to recover it. 1126Normally 1127.Nm 1128performs a large amount of consistency checking before removing an object. The 1129.Fl f 1130option tells 1131.Nm 1132to omit this checking and remove the object anyway. Use this option with great 1133care: it can result in total loss of data on a volume. 1134.Pp 1135Normally, 1136.Nm 1137refuses to remove a volume or plex if it has subordinate plexes or subdisks 1138respectively. You can tell 1139.Nm 1140to remove the object anyway by using the 1141.Fl f 1142option, or you can cause 1143.Nm 1144to remove the subordinate objects as well by using the 1145.Fl r 1146(recursive) option. If you remove a volume with the 1147.Fl r 1148option, it will remove both the plexes and the subdisks which belong to the 1149plexes. 1150.Pp 1151.It Ic saveconfig 1152Save the current configuration to disk. Normally this is not necessary, since 1153.Nm 1154automatically saves any change in configuration. If an error occurs on startup, 1155updates will be disabled. When you reenable them with the 1156.Ic setdaemon 1157command, 1158.Nm 1159does not automatically save the configuration to disk. Use this command to save 1160the configuration. 1161.\".Pp 1162.\".It Xo 1163.\".Ic set 1164.\".Op Fl f 1165.\".Ar state 1166.\".Ar volume | plex | subdisk | disk 1167.\".Xc 1168.\".Ic set 1169.\"sets the state of the specified object to one of the valid states (see 1170.\".Sx OBJECT STATES 1171.\"below). Normally 1172.\".Nm 1173.\"performs a large amount of consistency checking before making the change. The 1174.\".Fl f 1175.\"option tells 1176.\".Nm 1177.\"to omit this checking and perform the change anyway. Use this option with great 1178.\"care: it can result in total loss of data on a volume. 1179.Pp 1180.It Ic setdaemon Op Ar value 1181.Ic setdaemon 1182sets a variable bitmask for the 1183.Nm 1184daemon. This command is temporary and will be replaced. Currently, the bit mask 1185may contain the bits 1 (log every action to syslog) and 4 (don't update 1186configuration). Option bit 4 can be useful for error recovery. 1187.Pp 1188.It Xo 1189.Ic setstate Ar state 1190.Op Ar volume | plex | subdisk | drive 1191.Xc 1192.Ic setstate 1193sets the state of the specified objects to the specified state. This bypasses 1194the usual consistency mechanism of 1195.Nm 1196and should be used only for recovery purposes. It is possible to crash the 1197system by incorrect use of this command. 1198.Pp 1199.It Xo 1200.Ic start 1201.Op Fl i Ar interval 1202.Op Fl S Ar size 1203.Op Fl w 1204.Op Ar plex | subdisk 1205.Xc 1206.Ic start 1207starts (brings into to the 1208.Em up 1209state) one or more 1210.Nm 1211objects. 1212.Pp 1213If no object names are specified, 1214.Nm 1215scans the disks known to the system for 1216.Nm 1217drives and then reads in the configuration as described under the 1218.Ic read 1219commands. The 1220.Nm 1221drive contains a header with all information about the data stored on the drive, 1222including the names of the other drives which are required in order to represent 1223plexes and volumes. 1224.Pp 1225If 1226.Nm 1227encounters any errors during this command, it will turn off automatic 1228configuration update to avoid corrupting the copies on disk. This will also 1229happen if the configuration on disk indicates a configuration error (for 1230example, subdisks which do not have a valid space specification). You can turn 1231the updates on again with the 1232.Ic setdaemon 1233and 1234.Ic saveconfig 1235command. Reset bit 4 of the daemon options mask to re-enable configuration 1236saves. 1237.Pp 1238If object names are specified, 1239.Nm 1240starts them. Normally this operation is only of use with subdisks. The action 1241depends on the current state of the object: 1242.Bl -bullet 1243.It 1244If the object is already in the 1245.Em up 1246state, 1247.Nm 1248does nothing. 1249.It 1250If the object is a subdisk in the 1251.Em down 1252or 1253.Em reborn 1254states, 1255.Nm 1256changes it to the 1257.Em up 1258state. 1259.It 1260If the object is a subdisk in the 1261.Em empty 1262state, the change depends on the subdisk. If it is part of a plex which is part 1263of a volume which contains other plexes, 1264.Nm 1265places the subdisk in the 1266.Em reviving 1267state and attempts to copy the data from the volume. When the operation 1268completes, the subdisk is set into the 1269.Em up 1270state. If it is part of a plex which is part of a volume which contains no 1271other plexes, or if it is not part of a plex, 1272.Nm 1273brings it into the 1274.Em up 1275state immediately. 1276.It 1277If the object is a subdisk in the 1278.Em reviving 1279state, 1280.Nm 1281continues the revive 1282operation offline. When the operation completes, the subdisk is set into the 1283.Em up 1284state. 1285.El 1286.Pp 1287When a subdisk comes into the 1288.Em up 1289state, 1290.Nm 1291automatically checks the state of any plex and volume to which it may belong and 1292changes their state where appropriate. 1293.Pp 1294If the object is a plex, 1295.Ic start 1296checks the state of the subordinate subdisks (and plexes in the case of a 1297volume) and starts any subdisks which can be started. 1298.Pp 1299To start a plex in a multi-plex volume, the data must be copied from another 1300plex in the volume. Since this frequently takes a long time, it is normally 1301done in the background. If you want to wait for this operation to complete (for 1302example, if you are performing this operation in a script), use the 1303.Fl w 1304option. 1305.Pp 1306Copying data doesn't just take a long time, it can also place a significant load 1307on the system. You can specify the transfer size in bytes or sectors with the 1308.Fl S 1309option, and an interval (in milliseconds) to wait between copying each block with 1310the 1311.Fl i 1312option. Both of these options lessen the load on the system. 1313.Pp 1314.It Xo 1315.Ic stop 1316.Op Fl f 1317.Op Ar volume | plex | subdisk 1318.Xc 1319If no parameters are specified, 1320.Ic stop 1321removes the 1322.Nm 1323KLD and stops 1324.Xr vinum 4 . 1325This can only be done if no objects are active. In particular, the 1326.Fl f 1327option does not override this requirement. Normally, the 1328.Ic stop 1329command writes the current configuration back to the drives before terminating. 1330This will not be possible if configuration updates are disabled, so 1331.Nm 1332will not stop if configuration updates are disabled. You can override this by 1333specifying the 1334.Fl f 1335option. 1336.Pp 1337The 1338.Ic stop 1339command can only work if 1340.Nm 1341has been loaded as a KLD, since it is not possible to unload a statically 1342configured driver. 1343.Nm Ic stop 1344will fail if 1345.Nm 1346is statically configured. 1347.Pp 1348If object names are specified, 1349.Ic stop 1350disables access to the objects. If the objects have subordinate objects, they 1351subordinate objects must either already be inactive (stopped or in error), or 1352the 1353.Fl r 1354and 1355.Fl f 1356options must be specified. This command does not remove the objects from the 1357configuration. They can be accessed again after a 1358.Ic start 1359command. 1360.Pp 1361By default, 1362.Nm 1363does not stop active objects. For example, you cannot stop a plex which is 1364attached to an active volume, and you cannot stop a volume which is open. The 1365.Fl f 1366option tells 1367.Nm 1368to omit this checking and remove the object anyway. Use this option with great 1369care and understanding: used incorrectly, it can result in serious data 1370corruption. 1371.Pp 1372.It Xo 1373.Ic stripe 1374.Op Fl f 1375.Op Fl n Ar name 1376.Op Fl v 1377.Ar drives 1378.Xc 1379The 1380.Ic stripe 1381command provides a simplified alternative to the 1382.Ic create 1383command for creating volumes with a single striped plex. The size of the 1384subdisks is the size of the largest contiguous space available on all the 1385specified drives. The stripe size is fixed at 256 kB. 1386.Pp 1387Normally, the 1388.Ic stripe 1389command creates an arbitrary name for the volume and its components. The name 1390is composed of the text 1391.Dq Li vinum 1392and a small integer, for example 1393.Dq Li vinum3 . 1394You can override this with the 1395.Fl n Ar name 1396option, which assigns the name specified to the volume. The plexes and subdisks 1397are named after the volume in the default manner. 1398.Pp 1399There is no choice of name for the drives. If the drives have already been 1400initialized as 1401.Nm 1402drives, the name remains. Otherwise the drives are given names starting with 1403the text 1404.Dq Li vinumdrive 1405and a small integer, for example 1406.Dq Li vinumdrive7 . 1407As with the 1408.Ic create 1409command, the 1410.Fl f 1411option can be used to specify that a previous name should be overwritten. The 1412.Fl v 1413is used to specify verbose output. 1414.Pp 1415See the section 1416.Sx SIMPLIFIED CONFIGURATION 1417below for some examples of this 1418command. 1419.El 1420.Sh SIMPLIFIED CONFIGURATION 1421This section describes a simplified interface to 1422.Nm 1423configuration using the 1424.Ic concat , 1425.Ic mirror 1426and 1427.Ic stripe 1428commands. These commands create convenient configurations for some more normal 1429situations, but they are not as flexible as the 1430.Ic create 1431command. 1432.Pp 1433See above for the description of the commands. Here are some examples, all 1434performed with the same collection of disks. Note that the first drive, 1435.Pa /dev/da1s0h , 1436is smaller than the others. This has an effect on the sizes chosen for each 1437kind of subdisk. 1438.Pp 1439The following examples all use the 1440.Fl v 1441option to show the commands passed to the system, and also to list the structure 1442of the volume. Without the 1443.Fl v 1444option, these commands produce no output. 1445.Ss Volume with a single concatenated plex 1446Use a volume with a single concatenated plex for the largest possible storage 1447without resilience to drive failures: 1448.Bd -literal 1449vinum -> concat -v /dev/da1s0h /dev/da2s0h /dev/da3s0h /dev/da4s0h 1450volume vinum0 1451 plex name vinum0.p0 org concat 1452drive vinumdrive0 device /dev/da1s0h 1453 sd name vinum0.p0.s0 drive vinumdrive0 size 0 1454drive vinumdrive1 device /dev/da2s0h 1455 sd name vinum0.p0.s1 drive vinumdrive1 size 0 1456drive vinumdrive2 device /dev/da3s0h 1457 sd name vinum0.p0.s2 drive vinumdrive2 size 0 1458drive vinumdrive3 device /dev/da4s0h 1459 sd name vinum0.p0.s3 drive vinumdrive3 size 0 1460V vinum0 State: up Plexes: 1 Size: 2134 MB 1461P vinum0.p0 C State: up Subdisks: 4 Size: 2134 MB 1462S vinum0.p0.s0 State: up PO: 0 B Size: 414 MB 1463S vinum0.p0.s1 State: up PO: 414 MB Size: 573 MB 1464S vinum0.p0.s2 State: up PO: 988 MB Size: 573 MB 1465S vinum0.p0.s3 State: up PO: 1561 MB Size: 573 MB 1466.Ed 1467.Pp 1468In this case, the complete space on all four disks was used, giving a volume 14692134 MB in size. 1470.Ss Volume with a single striped plex 1471A volume with a single striped plex may give better performance than a 1472concatenated plex, but restrictions on striped plexes can mean that the volume 1473is smaller. It will also not be resilient to a drive failure: 1474.Bd -literal 1475vinum -> stripe -v /dev/da1s0h /dev/da2s0h /dev/da3s0h /dev/da4s0h 1476drive vinumdrive0 device /dev/da1s0h 1477drive vinumdrive1 device /dev/da2s0h 1478drive vinumdrive2 device /dev/da3s0h 1479drive vinumdrive3 device /dev/da4s0h 1480volume vinum0 1481 plex name vinum0.p0 org striped 256k 1482 sd name vinum0.p0.s0 drive vinumdrive0 size 849825b 1483 sd name vinum0.p0.s1 drive vinumdrive1 size 849825b 1484 sd name vinum0.p0.s2 drive vinumdrive2 size 849825b 1485 sd name vinum0.p0.s3 drive vinumdrive3 size 849825b 1486V vinum0 State: up Plexes: 1 Size: 1659 MB 1487P vinum0.p0 S State: up Subdisks: 4 Size: 1659 MB 1488S vinum0.p0.s0 State: up PO: 0 B Size: 414 MB 1489S vinum0.p0.s1 State: up PO: 256 kB Size: 414 MB 1490S vinum0.p0.s2 State: up PO: 512 kB Size: 414 MB 1491S vinum0.p0.s3 State: up PO: 768 kB Size: 414 MB 1492.Ed 1493.Pp 1494In this case, the size of the subdisks has been limited to the smallest 1495available disk, so the resulting volume is only 1659 MB in size. 1496.Ss Mirrored volume with two concatenated plexes 1497For more reliability, use a mirrored, concatenated volume: 1498.Bd -literal 1499vinum -> mirror -v -n mirror /dev/da1s0h /dev/da2s0h /dev/da3s0h /dev/da4s0h 1500drive vinumdrive0 device /dev/da1s0h 1501drive vinumdrive1 device /dev/da2s0h 1502drive vinumdrive2 device /dev/da3s0h 1503drive vinumdrive3 device /dev/da4s0h 1504volume mirror setupstate 1505 plex name mirror.p0 org concat 1506 sd name mirror.p0.s0 drive vinumdrive0 size 0b 1507 sd name mirror.p0.s1 drive vinumdrive2 size 0b 1508 plex name mirror.p1 org concat 1509 sd name mirror.p1.s0 drive vinumdrive1 size 0b 1510 sd name mirror.p1.s1 drive vinumdrive3 size 0b 1511V mirror State: up Plexes: 2 Size: 1146 MB 1512P mirror.p0 C State: up Subdisks: 2 Size: 988 MB 1513P mirror.p1 C State: up Subdisks: 2 Size: 1146 MB 1514S mirror.p0.s0 State: up PO: 0 B Size: 414 MB 1515S mirror.p0.s1 State: up PO: 414 MB Size: 573 MB 1516S mirror.p1.s0 State: up PO: 0 B Size: 573 MB 1517S mirror.p1.s1 State: up PO: 573 MB Size: 573 MB 1518.Ed 1519.Pp 1520This example specifies the name of the volume, 1521.Ar mirror . 1522Since one drive is smaller than the others, the two plexes are of different 1523size, and the last 158 MB of the volume is non-resilient. To ensure complete 1524reliability in such a situation, use the 1525.Ic create 1526command to create a volume with 988 MB. 1527.Ss Mirrored volume with two striped plexes 1528Alternatively, use the 1529.Fl s 1530option to create a mirrored volume with two striped plexes: 1531.Bd -literal 1532vinum -> mirror -v -n raid10 -s /dev/da1s0h /dev/da2s0h /dev/da3s0h /dev/da4s0h 1533drive vinumdrive0 device /dev/da1s0h 1534drive vinumdrive1 device /dev/da2s0h 1535drive vinumdrive2 device /dev/da3s0h 1536drive vinumdrive3 device /dev/da4s0h 1537volume raid10 setupstate 1538 plex name raid10.p0 org striped 256k 1539 sd name raid10.p0.s0 drive vinumdrive0 size 849825b 1540 sd name raid10.p0.s1 drive vinumdrive2 size 849825b 1541 plex name raid10.p1 org striped 256k 1542 sd name raid10.p1.s0 drive vinumdrive1 size 1173665b 1543 sd name raid10.p1.s1 drive vinumdrive3 size 1173665b 1544V raid10 State: up Plexes: 2 Size: 1146 MB 1545P raid10.p0 S State: up Subdisks: 2 Size: 829 MB 1546P raid10.p1 S State: up Subdisks: 2 Size: 1146 MB 1547S raid10.p0.s0 State: up PO: 0 B Size: 414 MB 1548S raid10.p0.s1 State: up PO: 256 kB Size: 414 MB 1549S raid10.p1.s0 State: up PO: 0 B Size: 573 MB 1550S raid10.p1.s1 State: up PO: 256 kB Size: 573 MB 1551.Ed 1552.Pp 1553In this case, the usable part of the volume is even smaller, since the first 1554plex has shrunken to match the smallest drive. 1555.Sh CONFIGURATION FILE 1556.Nm 1557requires that all parameters to the 1558.Ic create 1559commands must be in a configuration file. Entries in the configuration file 1560define volumes, plexes and subdisks, and may be in free format, except that each 1561entry must be on a single line. 1562.Ss Scale factors 1563Some configuration file parameters specify a size (lengths, stripe sizes). 1564These values can be specified as bytes, or one of the following scale factors 1565may be appended: 1566.Bl -tag -width indent 1567.It s 1568specifies that the value is a number of sectors of 512 bytes. 1569.It k 1570specifies that the value is a number of kilobytes (1024 bytes). 1571.It m 1572specifies that the value is a number of megabytes (1048576 bytes). 1573.It g 1574specifies that the value is a number of gigabytes (1073741824 bytes). 1575.It b 1576is used for compatibility with 1577.Tn VERITAS . 1578It stands for blocks of 512 bytes. 1579This abbreviation is confusing, since the word 1580.Dq block 1581is used in different 1582meanings, and its use is deprecated. 1583.El 1584.Pp 1585For example, the value 16777216 bytes can also be written as 1586.Em 16m , 1587.Em 16384k 1588or 1589.Em 32768s . 1590.Pp 1591The configuration file can contain the following entries: 1592.Bl -tag -width 4n 1593.It Ic drive Ar name devicename Op Ar options 1594Define a drive. The options are: 1595.Bl -tag -width 18n 1596.It Cm device Ar devicename 1597Specify the device on which the drive resides. 1598.Ar devicename 1599must be the name of a disk partition, for example 1600.Pa /dev/da1s0e 1601or 1602.Pa /dev/ad3s2h , 1603and it must be of type 1604.Em vinum . 1605Do not use the 1606.Dq Li c 1607partition, which is reserved for the complete disk. 1608.It Cm hotspare 1609Define the drive to be a 1610.Dq hot spare 1611drive, which is maintained to automatically replace a failed drive. 1612.Nm 1613does not allow this drive to be used for any other purpose. In particular, it 1614is not possible to create subdisks on it. This functionality has not been 1615completely implemented. 1616.El 1617.It Ic volume Ar name Op Ar options 1618Define a volume with name 1619.Ar name . 1620Options are: 1621.Bl -tag -width 18n 1622.It Cm plex Ar plexname 1623Add the specified plex to the volume. If 1624.Ar plexname 1625is specified as 1626.Cm * , 1627.Nm 1628will look for the definition of the plex as the next possible entry in the 1629configuration file after the definition of the volume. 1630.It Cm readpol Ar policy 1631Define a 1632.Em read policy 1633for the volume. 1634.Ar policy 1635may be either 1636.Cm round 1637or 1638.Cm prefer Ar plexname . 1639.Nm 1640satisfies a read request from only one of the plexes. A 1641.Cm round 1642read policy specifies that each read should be performed from a different plex 1643in 1644.Em round-robin 1645fashion. A 1646.Cm prefer 1647read policy reads from the specified plex every time. 1648.It Cm setupstate 1649When creating a multi-plex volume, assume that the contents of all the plexes 1650are consistent. This is normally not the case, so by default 1651.Nm 1652sets all plexes except the first one to the 1653.Em faulty 1654state. Use the 1655.Ic start 1656command to first bring them to a consistent state. In the case of striped and 1657concatenated plexes, however, it does not normally cause problems to leave them 1658inconsistent: when using a volume for a file system or a swap partition, the 1659previous contents of the disks are not of interest, so they may be ignored. 1660If you want to take this risk, use the 1661.Cm setupstate 1662keyword. It will only apply to the plexes defined immediately after the volume 1663in the configuration file. If you add plexes to a volume at a later time, you 1664must integrate them manually with the 1665.Ic start 1666command. 1667.Pp 1668Note that you 1669.Em must 1670use the 1671.Ic init 1672command with RAID-5 plexes: otherwise extreme data corruption will result if one 1673subdisk fails. 1674.El 1675.It Ic plex Op Ar options 1676Define a plex. Unlike a volume, a plex does not need a name. The options may 1677be: 1678.Bl -tag -width 18n 1679.It Cm name Ar plexname 1680Specify the name of the plex. Note that you must use the keyword 1681.Cm name 1682when naming a plex or subdisk. 1683.It Cm org Ar organization Op Ar stripesize 1684Specify the organization of the plex. 1685.Ar organization 1686can be one of 1687.Cm concat , striped 1688or 1689.Cm raid5 . 1690For 1691.Cm striped 1692and 1693.Cm raid5 1694plexes, the parameter 1695.Ar stripesize 1696must be specified, while for 1697.Cm concat 1698it must be omitted. For type 1699.Cm striped , 1700it specifies the width of each stripe. For type 1701.Cm raid5 , 1702it specifies the size of a group. A group is a portion of a plex which 1703stores the parity bits all in the same subdisk. It must be a factor of the plex size (in 1704other words, the result of dividing the plex size by the stripe size must be an 1705integer), and it must be a multiple of a disk sector (512 bytes). 1706.Pp 1707For optimum performance, stripes should be at least 128 kB in size: anything 1708smaller will result in a significant increase in I/O activity due to mapping of 1709individual requests over multiple disks. The performance improvement due to the 1710increased number of concurrent transfers caused by this mapping will not make up 1711for the performance drop due to the increase in latency. A good guideline for 1712stripe size is between 256 kB and 512 kB. Avoid powers of 2, however: they tend 1713to cause all superblocks to be placed on the first subdisk. 1714.Pp 1715A striped plex must have at least two subdisks (otherwise it is a concatenated 1716plex), and each must be the same size. A RAID-5 plex must have at least three 1717subdisks, and each must be the same size. In practice, a RAID-5 plex should 1718have at least 5 subdisks. 1719.It Cm volume Ar volname 1720Add the plex to the specified volume. If no 1721.Cm volume 1722keyword is specified, the plex will be added to the last volume mentioned in the 1723configuration file. 1724.It Cm sd Ar sdname offset 1725Add the specified subdisk to the plex at offset 1726.Ar offset . 1727.El 1728.It Ic subdisk Op Ar options 1729Define a subdisk. Options may be: 1730.Bl -hang -width 18n 1731.It Cm name Ar name 1732Specify the name of a subdisk. It is not necessary to specify a name for a 1733subdisk. 1734Note that you must specify the keyword 1735.Cm name 1736if you wish to name a subdisk. 1737.It Cm plexoffset Ar offset 1738Specify the starting offset of the subdisk in the plex. If not specified, 1739.Nm 1740allocates the space immediately after the previous subdisk, if any, or otherwise 1741at the beginning of the plex. 1742.It Cm driveoffset Ar offset 1743Specify the starting offset of the subdisk in the drive. If not specified, 1744.Nm 1745allocates the first contiguous 1746.Ar length 1747bytes of free space on the drive. 1748.It Cm length Ar length 1749Specify the length of the subdisk. This keyword must be specified. There is no 1750default, but the value 0 may be specified to mean 1751.Dq "use the largest available contiguous free area on the drive" . 1752If the drive is empty, this means that the entire drive will be used for the 1753subdisk. 1754.Cm length 1755may be shortened to 1756.Cm len . 1757.It Cm plex Ar plex 1758Specify the plex to which the subdisk belongs. By default, the subdisk belongs 1759to the last plex specified. 1760.It Cm drive Ar drive 1761Specify the drive on which the subdisk resides. By default, the subdisk resides 1762on the last drive specified. 1763.El 1764.El 1765.Sh EXAMPLE CONFIGURATION FILE 1766.Bd -literal 1767# Sample vinum configuration file 1768# 1769# Our drives 1770drive drive1 device /dev/da1s0h 1771drive drive2 device /dev/da2s0h 1772drive drive3 device /dev/da3s0h 1773drive drive4 device /dev/da4s0h 1774drive drive5 device /dev/da5s0h 1775drive drive6 device /dev/da6s0h 1776# A volume with one striped plex 1777volume tinyvol 1778 plex org striped 512b 1779 sd length 64m drive drive2 1780 sd length 64m drive drive4 1781volume stripe 1782 plex org striped 512b 1783 sd length 512m drive drive2 1784 sd length 512m drive drive4 1785# Two plexes 1786volume concat 1787 plex org concat 1788 sd length 100m drive drive2 1789 sd length 50m drive drive4 1790 plex org concat 1791 sd length 150m drive drive4 1792# A volume with one striped plex and one concatenated plex 1793volume strcon 1794 plex org striped 512b 1795 sd length 100m drive drive2 1796 sd length 100m drive drive4 1797 plex org concat 1798 sd length 150m drive drive2 1799 sd length 50m drive drive4 1800# a volume with a RAID-5 and a striped plex 1801# note that the RAID-5 volume is longer by 1802# the length of one subdisk 1803volume vol5 1804 plex org striped 64k 1805 sd length 1000m drive drive2 1806 sd length 1000m drive drive4 1807 plex org raid5 32k 1808 sd length 500m drive drive1 1809 sd length 500m drive drive2 1810 sd length 500m drive drive3 1811 sd length 500m drive drive4 1812 sd length 500m drive drive5 1813.Ed 1814.Sh DRIVE LAYOUT CONSIDERATIONS 1815.Nm 1816drives are currently 1817.Bx 1818disk partitions. They must be of type 1819.Em vinum 1820in order to avoid overwriting data used for other purposes. Use 1821.Nm disklabel Fl e 1822to edit a partition type definition. The following display shows a typical 1823partition layout as shown by 1824.Xr disklabel 8 : 1825.Bd -literal 182616 partitions: 1827# size offset fstype 1828 a: 81920 344064 4.2BSD # 40.000M 1829 b: 262144 81920 swap # 128.000M 1830 c: 4226725 0 unused # 2063.830M 1831 e: 81920 0 4.2BSD # 40.000M 1832 f: 1900000 425984 4.2BSD # 927.734M 1833 g: 1900741 2325984 vinum # 928.095M 1834.Ed 1835.Pp 1836In this example, partition 1837.Dq Li g 1838may be used as a 1839.Nm 1840partition. Partitions 1841.Dq Li a , 1842.Dq Li e 1843and 1844.Dq Li f 1845may be used as 1846.Xr UFS 5 1847file systems. 1848Partition 1849.Dq Li b 1850is a swap partition, and partition 1851.Dq Li c 1852represents the whole disk and should not be used for any other purpose. 1853.Pp 1854.Nm 1855uses the first 265 sectors on each partition for configuration information, so 1856the maximum size of a subdisk is 265 sectors smaller than the drive. 1857.Sh LOG FILE 1858.Nm 1859maintains a log file, by default 1860.Pa /var/tmp/vinum_history , 1861in which it keeps track of the commands issued to 1862.Nm . 1863You can override the name of this file by setting the environment variable 1864.Ev VINUM_HISTORY 1865to the name of the file. 1866.Pp 1867Each message in the log file is preceded by a date. The default format is 1868.Qq Li %e %b %Y %H:%M:%S . 1869See 1870.Xr strftime 3 1871for further details of the format string. It can be overridden by the 1872environment variable 1873.Ev VINUM_DATEFORMAT . 1874.Sh HOW TO SET UP VINUM 1875This section gives practical advice about how to implement a 1876.Nm 1877system. 1878.Ss Where to put the data 1879The first choice you need to make is where to put the data. You need dedicated 1880disk partitions for 1881.Nm . 1882They should be partitions, not devices, and they should not be partition 1883.Dq Li c . 1884For example, good names are 1885.Pa /dev/da0s0e 1886or 1887.Pa /dev/ad3s4a . 1888Bad names are 1889.Pa /dev/da0 1890and 1891.Pa /dev/da0s1 , 1892both of which represent a device, not a partition, and 1893.Pa /dev/ad1s0c , 1894which represents a complete disk and should be of type 1895.Em unused . 1896See the example under 1897.Sx DRIVE LAYOUT CONSIDERATIONS 1898above. 1899.Ss Designing volumes 1900The way you set up 1901.Nm 1902volumes depends on your intentions. There are a number of possibilities: 1903.Bl -enum 1904.It 1905You may want to join up a number of small disks to make a reasonable sized file 1906system. For example, if you had five small drives and wanted to use all the 1907space for a single volume, you might write a configuration file like: 1908.Bd -literal -offset indent 1909drive d1 device /dev/da2s0e 1910drive d2 device /dev/da3s0e 1911drive d3 device /dev/da4s0e 1912drive d4 device /dev/da5s0e 1913drive d5 device /dev/da6s0e 1914volume bigger 1915 plex org concat 1916 sd length 0 drive d1 1917 sd length 0 drive d2 1918 sd length 0 drive d3 1919 sd length 0 drive d4 1920 sd length 0 drive d5 1921.Ed 1922.Pp 1923In this case, you specify the length of the subdisks as 0, which means 1924.Dq "use the largest area of free space that you can find on the drive" . 1925If the subdisk is the only subdisk on the drive, it will use all available 1926space. 1927.It 1928You want to set up 1929.Nm 1930to obtain additional resilience against disk failures. You have the choice of 1931RAID-1, also called 1932.Dq mirroring , 1933or RAID-5, also called 1934.Dq parity . 1935.Pp 1936To set up mirroring, create multiple plexes in a volume. For example, to create 1937a mirrored volume of 2 GB, you might create the following configuration file: 1938.Bd -literal -offset indent 1939drive d1 device /dev/da2s0e 1940drive d2 device /dev/da3s0e 1941volume mirror 1942 plex org concat 1943 sd length 2g drive d1 1944 plex org concat 1945 sd length 2g drive d2 1946.Ed 1947.Pp 1948When creating mirrored drives, it is important to ensure that the data from each 1949plex is on a different physical disk so that 1950.Nm 1951can access the complete address space of the volume even if a drive fails. 1952Note that each plex requires as much data as the complete volume: in this 1953example, the volume has a size of 2 GB, but each plex (and each subdisk) 1954requires 2 GB, so the total disk storage requirement is 4 GB. 1955.Pp 1956To set up RAID-5, create a single plex of type 1957.Cm raid5 . 1958For example, to create an equivalent resilient volume of 2 GB, you might use the 1959following configuration file: 1960.Bd -literal -offset indent 1961drive d1 device /dev/da2s0e 1962drive d2 device /dev/da3s0e 1963drive d3 device /dev/da4s0e 1964drive d4 device /dev/da5s0e 1965drive d5 device /dev/da6s0e 1966volume raid 1967 plex org raid5 512k 1968 sd length 512m drive d1 1969 sd length 512m drive d2 1970 sd length 512m drive d3 1971 sd length 512m drive d4 1972 sd length 512m drive d5 1973.Ed 1974.Pp 1975RAID-5 plexes require at least three subdisks, one of which is used for storing 1976parity information and is lost for data storage. The more disks you use, the 1977greater the proportion of the disk storage can be used for data storage. In 1978this example, the total storage usage is 2.5 GB, compared to 4 GB for a mirrored 1979configuration. If you were to use the minimum of only three disks, you would 1980require 3 GB to store the information, for example: 1981.Bd -literal -offset indent 1982drive d1 device /dev/da2s0e 1983drive d2 device /dev/da3s0e 1984drive d3 device /dev/da4s0e 1985volume raid 1986 plex org raid5 512k 1987 sd length 1g drive d1 1988 sd length 1g drive d2 1989 sd length 1g drive d3 1990.Ed 1991.Pp 1992As with creating mirrored drives, it is important to ensure that the data from 1993each subdisk is on a different physical disk so that 1994.Nm 1995can access the complete address space of the volume even if a drive fails. 1996.It 1997You want to set up 1998.Nm 1999to allow more concurrent access to a file system. In many cases, access to a 2000file system is limited by the speed of the disk. By spreading the volume across 2001multiple disks, you can increase the throughput in multi-access environments. 2002This technique shows little or no performance improvement in single-access 2003environments. 2004.Nm 2005uses a technique called 2006.Dq striping , 2007or sometimes RAID-0, to increase this concurrency of access. The name RAID-0 is 2008misleading: striping does not provide any redundancy or additional reliability. 2009In fact, it decreases the reliability, since the failure of a single disk will 2010render the volume useless, and the more disks you have, the more likely it is 2011that one of them will fail. 2012.Pp 2013To implement striping, use a 2014.Cm striped 2015plex: 2016.Bd -literal -offset indent 2017drive d1 device /dev/da2s0e 2018drive d2 device /dev/da3s0e 2019drive d3 device /dev/da4s0e 2020drive d4 device /dev/da5s0e 2021volume raid 2022 plex org striped 512k 2023 sd length 512m drive d1 2024 sd length 512m drive d2 2025 sd length 512m drive d3 2026 sd length 512m drive d4 2027.Ed 2028.Pp 2029A striped plex must have at least two subdisks, but the increase in performance 2030is greater if you have a larger number of disks. 2031.It 2032You may want to have the best of both worlds and have both resilience and 2033performance. This is sometimes called RAID-10 (a combination of RAID-1 and 2034RAID-0), though again this name is misleading. With 2035.Nm 2036you can do this with the following configuration file: 2037.Bd -literal -offset indent 2038drive d1 device /dev/da2s0e 2039drive d2 device /dev/da3s0e 2040drive d3 device /dev/da4s0e 2041drive d4 device /dev/da5s0e 2042volume raid setupstate 2043 plex org striped 512k 2044 sd length 512m drive d1 2045 sd length 512m drive d2 2046 sd length 512m drive d3 2047 sd length 512m drive d4 2048 plex org striped 512k 2049 sd length 512m drive d4 2050 sd length 512m drive d3 2051 sd length 512m drive d2 2052 sd length 512m drive d1 2053.Ed 2054.Pp 2055Here the plexes are striped, increasing performance, and there are two of them, 2056increasing reliability. Note that this example shows the subdisks of the second 2057plex in reverse order from the first plex. This is for performance reasons and 2058will be discussed below. In addition, the volume specification includes the 2059keyword 2060.Cm setupstate , 2061which ensures that all plexes are 2062.Em up 2063after creation. 2064.El 2065.Ss Creating the volumes 2066Once you have created your configuration files, start 2067.Nm 2068and create the volumes. In this example, the configuration is in the file 2069.Pa configfile : 2070.Bd -literal -offset 2n 2071# vinum create -v configfile 2072 1: drive d1 device /dev/da2s0e 2073 2: drive d2 device /dev/da3s0e 2074 3: volume mirror 2075 4: plex org concat 2076 5: sd length 2g drive d1 2077 6: plex org concat 2078 7: sd length 2g drive d2 2079Configuration summary 2080 2081Drives: 2 (4 configured) 2082Volumes: 1 (4 configured) 2083Plexes: 2 (8 configured) 2084Subdisks: 2 (16 configured) 2085 2086Drive d1: Device /dev/da2s0e 2087 Created on vinum.lemis.com at Tue Mar 23 12:30:31 1999 2088 Config last updated Tue Mar 23 14:30:32 1999 2089 Size: 60105216000 bytes (57320 MB) 2090 Used: 2147619328 bytes (2048 MB) 2091 Available: 57957596672 bytes (55272 MB) 2092 State: up 2093 Last error: none 2094Drive d2: Device /dev/da3s0e 2095 Created on vinum.lemis.com at Tue Mar 23 12:30:32 1999 2096 Config last updated Tue Mar 23 14:30:33 1999 2097 Size: 60105216000 bytes (57320 MB) 2098 Used: 2147619328 bytes (2048 MB) 2099 Available: 57957596672 bytes (55272 MB) 2100 State: up 2101 Last error: none 2102 2103Volume mirror: Size: 2147483648 bytes (2048 MB) 2104 State: up 2105 Flags: 2106 2 plexes 2107 Read policy: round robin 2108 2109Plex mirror.p0: Size: 2147483648 bytes (2048 MB) 2110 Subdisks: 1 2111 State: up 2112 Organization: concat 2113 Part of volume mirror 2114Plex mirror.p1: Size: 2147483648 bytes (2048 MB) 2115 Subdisks: 1 2116 State: up 2117 Organization: concat 2118 Part of volume mirror 2119 2120Subdisk mirror.p0.s0: 2121 Size: 2147483648 bytes (2048 MB) 2122 State: up 2123 Plex mirror.p0 at offset 0 2124 2125Subdisk mirror.p1.s0: 2126 Size: 2147483648 bytes (2048 MB) 2127 State: up 2128 Plex mirror.p1 at offset 0 2129.Ed 2130.Pp 2131The 2132.Fl v 2133option tells 2134.Nm 2135to list the file as it configures. Subsequently it lists the current 2136configuration in the same format as the 2137.Ic list Fl v 2138command. 2139.Ss Creating more volumes 2140Once you have created the 2141.Nm 2142volumes, 2143.Nm 2144keeps track of them in its internal configuration files. You do not need to 2145create them again. In particular, if you run the 2146.Ic create 2147command again, you will create additional objects: 2148.Bd -literal 2149# vinum create sampleconfig 2150Configuration summary 2151 2152Drives: 2 (4 configured) 2153Volumes: 1 (4 configured) 2154Plexes: 4 (8 configured) 2155Subdisks: 4 (16 configured) 2156 2157D d1 State: up Device /dev/da2s0e Avail: 53224/57320 MB (92%) 2158D d2 State: up Device /dev/da3s0e Avail: 53224/57320 MB (92%) 2159 2160V mirror State: up Plexes: 4 Size: 2048 MB 2161 2162P mirror.p0 C State: up Subdisks: 1 Size: 2048 MB 2163P mirror.p1 C State: up Subdisks: 1 Size: 2048 MB 2164P mirror.p2 C State: up Subdisks: 1 Size: 2048 MB 2165P mirror.p3 C State: up Subdisks: 1 Size: 2048 MB 2166 2167S mirror.p0.s0 State: up PO: 0 B Size: 2048 MB 2168S mirror.p1.s0 State: up PO: 0 B Size: 2048 MB 2169S mirror.p2.s0 State: up PO: 0 B Size: 2048 MB 2170S mirror.p3.s0 State: up PO: 0 B Size: 2048 MB 2171.Ed 2172.Pp 2173As this example (this time with the 2174.Fl f 2175option) shows, re-running the 2176.Ic create 2177has created four new plexes, each with a new subdisk. If you want to add other 2178volumes, create new configuration files for them. They do not need to reference 2179the drives that 2180.Nm 2181already knows about. For example, to create a volume 2182.Pa raid 2183on the four drives 2184.Pa /dev/da1s0e , /dev/da2s0e , /dev/da3s0e 2185and 2186.Pa /dev/da4s0e , 2187you only need to mention the other two: 2188.Bd -literal -offset indent 2189drive d3 device /dev/da1s0e 2190drive d4 device /dev/da4s0e 2191volume raid 2192 plex org raid5 512k 2193 sd size 2g drive d1 2194 sd size 2g drive d2 2195 sd size 2g drive d3 2196 sd size 2g drive d4 2197.Ed 2198.Pp 2199With this configuration file, we get: 2200.Bd -literal 2201# vinum create newconfig 2202Configuration summary 2203 2204Drives: 4 (4 configured) 2205Volumes: 2 (4 configured) 2206Plexes: 5 (8 configured) 2207Subdisks: 8 (16 configured) 2208 2209D d1 State: up Device /dev/da2s0e Avail: 51176/57320 MB (89%) 2210D d2 State: up Device /dev/da3s0e Avail: 53220/57320 MB (89%) 2211D d3 State: up Device /dev/da1s0e Avail: 53224/57320 MB (92%) 2212D d4 State: up Device /dev/da4s0e Avail: 53224/57320 MB (92%) 2213 2214V mirror State: down Plexes: 4 Size: 2048 MB 2215V raid State: down Plexes: 1 Size: 6144 MB 2216 2217P mirror.p0 C State: init Subdisks: 1 Size: 2048 MB 2218P mirror.p1 C State: init Subdisks: 1 Size: 2048 MB 2219P mirror.p2 C State: init Subdisks: 1 Size: 2048 MB 2220P mirror.p3 C State: init Subdisks: 1 Size: 2048 MB 2221P raid.p0 R5 State: init Subdisks: 4 Size: 6144 MB 2222 2223S mirror.p0.s0 State: up PO: 0 B Size: 2048 MB 2224S mirror.p1.s0 State: up PO: 0 B Size: 2048 MB 2225S mirror.p2.s0 State: up PO: 0 B Size: 2048 MB 2226S mirror.p3.s0 State: up PO: 0 B Size: 2048 MB 2227S raid.p0.s0 State: empty PO: 0 B Size: 2048 MB 2228S raid.p0.s1 State: empty PO: 512 kB Size: 2048 MB 2229S raid.p0.s2 State: empty PO: 1024 kB Size: 2048 MB 2230S raid.p0.s3 State: empty PO: 1536 kB Size: 2048 MB 2231.Ed 2232.Pp 2233Note the size of the RAID-5 plex: it is only 6 GB, although together its 2234components use 8 GB of disk space. This is because the equivalent of one 2235subdisk is used for storing parity data. 2236.Ss Restarting Vinum 2237On rebooting the system, start 2238.Nm 2239with the 2240.Ic start 2241command: 2242.Pp 2243.Dl "# vinum start" 2244.Pp 2245This will start all the 2246.Nm 2247drives in the system. If for some reason you wish to start only some of them, 2248use the 2249.Ic read 2250command. 2251.Ss Performance considerations 2252A number of misconceptions exist about how to set up a RAID array for best 2253performance. In particular, most systems use far too small a stripe size. The 2254following discussion applies to all RAID systems, not just to 2255.Nm . 2256.Pp 2257The 2258.Dx 2259block I/O system issues requests of between .5kB and 128 kB; a 2260typical mix is somewhere round 8 kB. You can't stop any striping system from 2261breaking a request into two physical requests, and if you make the stripe small 2262enough, it can be broken into several. This will result in a significant drop 2263in performance: the decrease in transfer time per disk is offset by the order of 2264magnitude greater increase in latency. 2265.Pp 2266With modern disk sizes and the 2267.Dx 2268I/O system, you can expect to have a 2269reasonably small number of fragmented requests with a stripe size between 256 kB 2270and 512 kB; with correct RAID implementations there is no obvious reason not to 2271increase the size to 2 or 4 MB on a large disk. 2272.Pp 2273When choosing a stripe size, consider that most current 2274.Xr UFS 5 2275file systems have 2276cylinder groups 32 MB in size. If you have a stripe size and number of disks 2277both of which are a power of two, it is probable that all superblocks and inodes 2278will be placed on the same subdisk, which will impact performance significantly. 2279Choose an odd number instead, for example 479 kB. 2280.Pp 2281The easiest way to consider the impact of any transfer in a multi-access system 2282is to look at it from the point of view of the potential bottleneck, the disk 2283subsystem: how much total disk time does the transfer use? 2284Since just about 2285everything is cached, the time relationship between the request and its 2286completion is not so important: the important parameter is the total time that 2287the request keeps the disks active, the time when the disks are not available to 2288perform other transfers. As a result, it doesn't really matter if the transfers 2289are happening at the same time or different times. In practical terms, the time 2290we're looking at is the sum of the total latency (positioning time and 2291rotational latency, or the time it takes for the data to arrive under the disk 2292heads) and the total transfer time. For a given transfer to disks of the same 2293speed, the transfer time depends only on the total size of the transfer. 2294.Pp 2295Consider a typical news article or web page of 24 kB, which will probably be 2296read in a single I/O. Take disks with a transfer rate of 6 MB/s and an average 2297positioning time of 8 ms, and a file system with 4 kB blocks. Since it's 24 kB, 2298we don't have to worry about fragments, so the file will start on a 4 kB 2299boundary. The number of transfers required depends on where the block starts: 2300it's (S + F - 1) / S, where S is the stripe size in file system blocks, and F is 2301the file size in file system blocks. 2302.Bl -enum 2303.It 2304Stripe size of 4 kB. You'll have 6 transfers. Total subsystem load: 48 ms 2305latency, 2 ms transfer, 50 ms total. 2306.It 2307Stripe size of 8 kB. On average, you'll have 3.5 transfers. Total subsystem 2308load: 28 ms latency, 2 ms transfer, 30 ms total. 2309.It 2310Stripe size of 16 kB. On average, you'll have 2.25 transfers. Total subsystem 2311load: 18 ms latency, 2 ms transfer, 20 ms total. 2312.It 2313Stripe size of 256 kB. On average, you'll have 1.08 transfers. Total subsystem 2314load: 8.6 ms latency, 2 ms transfer, 10.6 ms total. 2315.It 2316Stripe size of 4 MB. On average, you'll have 1.0009 transfers. Total subsystem 2317load: 8.01 ms latency, 2 ms transfer, 10.01 ms total. 2318.El 2319.Pp 2320It appears that some hardware RAID systems have problems with large stripes: 2321they appear to always transfer a complete stripe to or from disk, so that a 2322large stripe size will have an adverse effect on performance. 2323.Nm 2324does not suffer from this problem: it optimizes all disk transfers and does not 2325transfer unneeded data. 2326.Pp 2327Note that no well-known benchmark program tests true multi-access conditions 2328(more than 100 concurrent users), so it is difficult to demonstrate the validity 2329of these statements. 2330.Pp 2331Given these considerations, the following factors affect the performance of a 2332.Nm 2333volume: 2334.Bl -bullet 2335.It 2336Striping improves performance for multiple access only, since it increases the 2337chance of individual requests being on different drives. 2338.It 2339Concatenating 2340.Xr UFS 5 2341file systems across multiple drives can also improve 2342performance for multiple file access, since 2343.Xr UFS 5 2344divides a file system into 2345cylinder groups and attempts to keep files in a single cylinder group. In 2346general, it is not as effective as striping. 2347.It 2348Mirroring can improve multi-access performance for reads, since by default 2349.Nm 2350issues consecutive reads to consecutive plexes. 2351.It 2352Mirroring decreases performance for all writes, whether multi-access or single 2353access, since the data must be written to both plexes. This explains the 2354subdisk layout in the example of a mirroring configuration above: if the 2355corresponding subdisk in each plex is on a different physical disk, the write 2356commands can be issued in parallel, whereas if they are on the same physical 2357disk, they will be performed sequentially. 2358.It 2359RAID-5 reads have essentially the same considerations as striped reads, unless 2360the striped plex is part of a mirrored volume, in which case the performance of 2361the mirrored volume will be better. 2362.It 2363RAID-5 writes are approximately 25% of the speed of striped writes: to perform 2364the write, 2365.Nm 2366must first read the data block and the corresponding parity block, perform some 2367calculations and write back the parity block and the data block, four times as 2368many transfers as for writing a striped plex. On the other hand, this is offset 2369by the cost of mirroring, so writes to a volume with a single RAID-5 plex are 2370approximately half the speed of writes to a correctly configured volume with two 2371striped plexes. 2372.It 2373When the 2374.Nm 2375configuration changes (for example, adding or removing objects, or the change of 2376state of one of the objects), 2377.Nm 2378writes up to 128 kB of updated configuration to each drive. The larger the 2379number of drives, the longer this takes. 2380.El 2381.Ss Creating file systems on Vinum volumes 2382You do not need to run 2383.Xr disklabel 8 2384before creating a file system on a 2385.Nm 2386volume. Just run 2387.Xr newfs 8 . 2388Use the 2389.Fl v 2390option to state that the device is not divided into partitions. For example, to 2391create a file system on volume 2392.Pa mirror , 2393enter the following command: 2394.Pp 2395.Dl "# newfs -v /dev/vinum/mirror" 2396.Pp 2397A number of other considerations apply to 2398.Nm 2399configuration: 2400.Bl -bullet 2401.It 2402There is no advantage in creating multiple drives on a single disk. Each drive 2403uses 131.5 kB of data for label and configuration information, and performance 2404will suffer when the configuration changes. Use appropriately sized subdisks instead. 2405.It 2406It is possible to increase the size of a concatenated 2407.Nm 2408plex, but currently the size of striped and RAID-5 plexes cannot be increased. 2409Currently the size of an existing 2410.Xr UFS 5 2411file system also cannot be increased, but 2412it is planned to make both plexes and file systems extensible. 2413.El 2414.Sh STATE MANAGEMENT 2415Vinum objects have the concept of 2416.Em state . 2417See 2418.Xr vinum 4 2419for more details. They are only completely accessible if their state is 2420.Em up . 2421To change an object state to 2422.Em up , 2423use the 2424.Ic start 2425command. To change an object state to 2426.Em down , 2427use the 2428.Ic stop 2429command. Normally other states are created automatically by the relationship 2430between objects. For example, if you add a plex to a volume, the subdisks of 2431the plex will be set in the 2432.Em empty 2433state, indicating that, though the hardware is accessible, the data on the 2434subdisk is invalid. As a result of this state, the plex will be set in the 2435.Em faulty 2436state. 2437.Ss The `reviving' state 2438In many cases, when you start a subdisk the system must copy data to the 2439subdisk. Depending on the size of the subdisk, this can take a long time. 2440During this time, the subdisk is set in the 2441.Em reviving 2442state. On successful completion of the copy operation, it is automatically set 2443to the 2444.Em up 2445state. It is possible for the process performing the revive to be stopped and 2446restarted. The system keeps track of how far the subdisk has been revived, and 2447when the 2448.Ic start 2449command is reissued, the copying continues from this point. 2450.Pp 2451In order to maintain the consistency of a volume while one or more of its plexes 2452is being revived, 2453.Nm 2454writes to subdisks which have been revived up to the point of the write. It may 2455also read from the plex if the area being read has already been revived. 2456.Sh GOTCHAS 2457The following points are not bugs, and they have good reasons for existing, but 2458they have shown to cause confusion. Each is discussed in the appropriate 2459section above. 2460.Bl -enum 2461.It 2462.Nm 2463drives are 2464.Ux 2465disk partitions and must have the partition type 2466.Em vinum . 2467.Pp 2468The 2469.Nm Ic start 2470command will not accept a drive on partition 2471.Dq Li c . 2472Partition 2473.Dq Li c 2474is used by the system to represent the whole disk, and must be of type 2475.Em unused . 2476Clearly there is a conflict here, which 2477.Nm 2478resolves by not using the 2479.Dq Li c 2480partition. 2481.It 2482When you create a volume with multiple plexes, 2483.Nm 2484does not automatically initialize the plexes. This means that the contents are 2485not known, but they are certainly not consistent. As a result, by default 2486.Nm 2487sets the state of all newly-created plexes except the first to 2488.Em faulty . 2489In order to synchronize them with the first plex, you must 2490.Ic start 2491them, which causes 2492.Nm 2493to copy the data from a plex which is in the 2494.Em up 2495state. Depending on the size of the subdisks involved, this can take a long 2496time. 2497.Pp 2498In practice, people aren't too interested in what was in the plex when it was 2499created, and other volume managers cheat by setting them 2500.Em up 2501anyway. 2502.Nm 2503provides two ways to ensure that newly created plexes are 2504.Em up : 2505.Bl -bullet 2506.It 2507Create the plexes and then synchronize them with 2508.Nm Ic start . 2509.It 2510Create the volume (not the plex) with the keyword 2511.Cm setupstate , 2512which tells 2513.Nm 2514to ignore any possible inconsistency and set the plexes to be 2515.Em up . 2516.El 2517.It 2518Some of the commands currently supported by 2519.Nm 2520are not really needed. For reasons which I don't understand, however, I find 2521that users frequently try the 2522.Ic label 2523and 2524.Ic resetconfig 2525commands, though especially 2526.Ic resetconfig 2527outputs all sort of dire warnings. Don't use these commands unless you have a 2528good reason to do so. 2529.It 2530Some state transitions are not very intuitive. In fact, it's not clear whether 2531this is a bug or a feature. If you find that you can't start an object in some 2532strange state, such as a 2533.Em reborn 2534subdisk, try first to get it into 2535.Em stopped 2536state, with the 2537.Ic stop 2538or 2539.Ic stop Fl f 2540commands. If that works, you should then be able to start it. If you find 2541that this is the only way to get out of a position where easier methods fail, 2542please report the situation. 2543.It 2544If you build the kernel module with the 2545.Fl D Ns Dv VINUMDEBUG 2546option, you must also build 2547.Nm 2548with the 2549.Fl D Ns Dv VINUMDEBUG 2550option, since the size of some data objects used by both components depends on 2551this option. If you don't do so, commands will fail with the message 2552.Sy Invalid argument , 2553and a console message will be logged such as 2554.Bl -diag 2555.It "vinumioctl: invalid ioctl from process 247 (vinum): c0e44642" 2556.El 2557.Pp 2558This error may also occur if you use old versions of KLD or userland program. 2559.It 2560The 2561.Nm Ic read 2562command has a particularly emetic syntax. Once it was the only way to start 2563.Nm , 2564but now the preferred method is with 2565.Nm Ic start . 2566.Nm Ic read 2567should be used for maintenance purposes only. Note that its syntax has changed, 2568and the arguments must be disk slices, such as 2569.Pa /dev/da0s0 , 2570not partitions such as 2571.Pa /dev/da0s0e . 2572.El 2573.Sh ENVIRONMENT 2574.Bl -tag -width VINUM_DATEFORMAT 2575.It Ev VINUM_HISTORY 2576The name of the log file, by default 2577.Pa /var/log/vinum_history . 2578.It Ev VINUM_DATEFORMAT 2579The format of dates in the log file, by default 2580.Qq Li %e %b %Y %H:%M:%S . 2581.It Ev EDITOR 2582The name of the editor to use for editing configuration files, by default 2583.Nm vi . 2584.El 2585.Sh FILES 2586.Bl -tag -width /dev/vinum/control -compact 2587.It Pa /dev/vinum 2588directory with device nodes for 2589.Nm 2590objects 2591.It Pa /dev/vinum/control 2592control device for 2593.Nm 2594.It Pa /dev/vinum/plex 2595directory containing device nodes for 2596.Nm 2597plexes 2598.It Pa /dev/vinum/sd 2599directory containing device nodes for 2600.Nm 2601subdisks 2602.El 2603.Sh SEE ALSO 2604.Xr strftime 3 , 2605.Xr vinum 4 , 2606.Xr disklabel 8 , 2607.Xr newfs 8 2608.Pp 2609.Pa http://www.vinumvm.org/vinum/ , 2610.Pa http://www.vinumvm.org/vinum/how-to-debug.html . 2611.Sh HISTORY 2612The 2613.Nm 2614command first appeared in 2615.Fx 3.0 . 2616The RAID-5 component of 2617.Nm 2618was developed for Cybernet Inc.\& 2619.Pq Pa www.cybernet.com 2620for its NetMAX product. 2621.Sh AUTHORS 2622.An Greg Lehey Aq grog@lemis.com 2623.\"XXX.Sh BUGS 2624