1.\" Hey, Emacs, edit this file in -*- nroff-fill -*- mode 2.\"- 3.\" Copyright (c) 1997, 1998 4.\" Nan Yang Computer Services Limited. All rights reserved. 5.\" 6.\" This software is distributed under the so-called ``Berkeley 7.\" License'': 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 3. All advertising materials mentioning features or use of this software 18.\" must display the following acknowledgement: 19.\" This product includes software developed by Nan Yang Computer 20.\" Services Limited. 21.\" 4. Neither the name of the Company nor the names of its contributors 22.\" may be used to endorse or promote products derived from this software 23.\" without specific prior written permission. 24.\" 25.\" This software is provided ``as is'', and any express or implied 26.\" warranties, including, but not limited to, the implied warranties of 27.\" merchantability and fitness for a particular purpose are disclaimed. 28.\" In no event shall the company or contributors be liable for any 29.\" direct, indirect, incidental, special, exemplary, or consequential 30.\" damages (including, but not limited to, procurement of substitute 31.\" goods or services; loss of use, data, or profits; or business 32.\" interruption) however caused and on any theory of liability, whether 33.\" in contract, strict liability, or tort (including negligence or 34.\" otherwise) arising in any way out of the use of this software, even if 35.\" advised of the possibility of such damage. 36.\" 37.\" $Id: vinum.8,v 1.48 2001/01/15 22:15:05 grog Exp $ 38.\" $FreeBSD: src/sbin/vinum/vinum.8,v 1.33.2.10 2002/12/29 16:35:38 schweikh Exp $ 39.\" 40.Dd December 20, 2000 41.Dt VINUM 8 42.Os 43.Sh NAME 44.Nm vinum 45.Nd Logical Volume Manager control program 46.Sh SYNOPSIS 47.Nm 48.Op Ar command 49.Op Fl options 50.Sh COMMANDS 51.Bl -tag -width indent 52.It Ic attach Ar plex volume Op Cm rename 53.It Xo 54.Ic attach Ar subdisk plex 55.Op Ar offset 56.Op Cm rename 57.Xc 58Attach a plex to a volume, or a subdisk to a plex. 59.It Xo 60.Ic checkparity Ar plex 61.Op Fl f 62.Op Fl v 63.Xc 64Check the parity blocks of a RAID-4 or RAID-5 plex. 65.It Xo 66.Ic concat 67.Op Fl f 68.Op Fl n Ar name 69.Op Fl v 70.Ar drives 71.Xc 72Create a concatenated volume from the specified drives. 73.It Xo 74.Ic create 75.Op Fl f 76.Ar description-file 77.Xc 78Create a volume as described in 79.Ar description-file . 80.It Ic debug 81Cause the volume manager to enter the kernel debugger. 82.It Ic debug Ar flags 83Set debugging flags. 84.It Xo 85.Ic detach 86.Op Fl f 87.Op Ar plex | subdisk 88.Xc 89Detach a plex or subdisk from the volume or plex to which it is attached. 90.It Ic dumpconfig Op Ar drive ... 91List the configuration information stored on the specified drives, or all drives 92in the system if no drive names are specified. 93.It Xo 94.Ic info 95.Op Fl v 96.Op Fl V 97.Xc 98List information about volume manager state. 99.It Xo 100.Ic init 101.Op Fl S Ar size 102.Op Fl w 103.Ar plex | subdisk 104.Xc 105.\" XXX 106Initialize the contents of a subdisk or all the subdisks of a plex to all zeros. 107.It Ic label Ar volume 108Create a volume label. 109.It Xo 110.Ic l | list 111.Op Fl r 112.Op Fl s 113.Op Fl v 114.Op Fl V 115.Op Ar volume | plex | subdisk 116.Xc 117List information about specified objects. 118.It Xo 119.Ic ld 120.Op Fl r 121.Op Fl s 122.Op Fl v 123.Op Fl V 124.Op Ar volume 125.Xc 126List information about drives. 127.It Xo 128.Ic ls 129.Op Fl r 130.Op Fl s 131.Op Fl v 132.Op Fl V 133.Op Ar subdisk 134.Xc 135List information about subdisks. 136.It Xo 137.Ic lp 138.Op Fl r 139.Op Fl s 140.Op Fl v 141.Op Fl V 142.Op Ar plex 143.Xc 144List information about plexes. 145.It Xo 146.Ic lv 147.Op Fl r 148.Op Fl s 149.Op Fl v 150.Op Fl V 151.Op Ar volume 152.Xc 153List information about volumes. 154.It Ic makedev 155Remake the device nodes in 156.Pa /dev/vinum . 157.It Xo 158.Ic mirror 159.Op Fl f 160.Op Fl n Ar name 161.Op Fl s 162.Op Fl v 163.Ar drives 164.Xc 165Create a mirrored volume from the specified drives. 166.It Xo 167.Ic move | mv 168.Fl f 169.Ar drive object ... 170.Xc 171Move the object(s) to the specified drive. 172.It Ic printconfig Op Ar file 173Write a copy of the current configuration to 174.Ar file . 175.It Ic quit 176Exit the 177.Nm 178program when running in interactive mode. Normally this would be done by 179entering the 180.Dv EOF 181character. 182.It Ic read Ar disk ... 183Read the 184.Nm 185configuration from the specified disks. 186.It Xo 187.Ic rename Op Fl r 188.Op Ar drive | subdisk | plex | volume 189.Ar newname 190.Xc 191Change the name of the specified object. 192.\" XXX 193.\".It Ic replace Ar drive newdrive 194.\"Move all the subdisks from the specified drive onto the new drive. 195.It Xo 196.Ic rebuildparity Ar plex Op Fl f 197.Op Fl v 198.Op Fl V 199.Xc 200Rebuild the parity blocks of a RAID-4 or RAID-5 plex. 201.It Ic resetconfig 202Reset the complete 203.Nm 204configuration. 205.It Xo 206.Ic resetstats 207.Op Fl r 208.Op Ar volume | plex | subdisk 209.Xc 210Reset statistics counters for the specified objects, or for all objects if none 211are specified. 212.It Xo 213.Ic rm 214.Op Fl f 215.Op Fl r 216.Ar volume | plex | subdisk 217.Xc 218Remove an object. 219.It Ic saveconfig 220Save 221.Nm 222configuration to disk after configuration failures. 223.\" XXX 224.\".It Xo 225.\".Ic set 226.\".Op Fl f 227.\".Ar state 228.\".Ar volume | plex | subdisk | disk 229.\".Xc 230.\"Set the state of the object to 231.\".Ar state . 232.It Ic setdaemon Op Ar value 233Set daemon configuration. 234.It Xo 235.Ic setstate 236.Ar state 237.Op Ar volume | plex | subdisk | drive 238.Xc 239Set state without influencing other objects, for diagnostic purposes only. 240.It Ic start 241Read configuration from all vinum drives. 242.It Xo 243.Ic start 244.Op Fl i Ar interval 245.Op Fl S Ar size 246.Op Fl w 247.Ar volume | plex | subdisk 248.Xc 249Allow the system to access the objects. 250.It Xo 251.Ic stop 252.Op Fl f 253.Op Ar volume | plex | subdisk 254.Xc 255Terminate access to the objects, or stop 256.Nm 257if no parameters are specified. 258.It Xo 259.Ic stripe 260.Op Fl f 261.Op Fl n Ar name 262.Op Fl v 263.Ar drives 264.Xc 265Create a striped volume from the specified drives. 266.El 267.Sh DESCRIPTION 268.Nm 269is a utility program to communicate with the 270.Xr vinum 4 271logical volume 272manager. 273.Nm 274is designed either for interactive use, when started without command line 275arguments, or to execute a single command if the command is supplied on the 276command line. In interactive mode, 277.Nm 278maintains a command line history. 279.Sh OPTIONS 280.Nm 281commands may optionally be followed by an option. Any of the following options 282may be specified with any command, but in some cases the options are ignored. 283For example, the 284.Ic stop 285command ignores the 286.Fl v 287and 288.Fl V 289options. 290.Bl -tag -width indent 291.It Fl f 292The 293.Fl f 294.Pq Dq force 295option overrides safety checks. Use with extreme care. This option is for 296emergency use only. For example, the command 297.Pp 298.Dl rm -f myvolume 299.Pp 300removes 301.Ar myvolume 302even if it is open. Any subsequent access to the volume will almost certainly 303cause a panic. 304.It Fl i Ar millisecs 305When performing the 306.Ic init 307and 308.Ic start 309commands, wait 310.Ar millisecs 311milliseconds between copying each block. This lowers the load on the system. 312.It Fl n Ar name 313Use the 314.Fl n 315option to specify a volume name to the simplified configuration commands 316.Ic concat , mirror 317and 318.Ic stripe . 319.It Fl r 320The 321.Fl r 322.Pq Dq recursive 323option is used by the list commands to display information not 324only about the specified objects, but also about subordinate objects. For 325example, in conjunction with the 326.Ic lv 327command, the 328.Fl r 329option will also show information about the plexes and subdisks belonging to the 330volume. 331.It Fl s 332The 333.Fl s 334.Pq Dq statistics 335option is used by the list commands to display statistical information. The 336.Ic mirror 337command also uses this option to specify that it should create striped plexes. 338.It Fl S Ar size 339The 340.Fl S 341option specifies the transfer size for the 342.Ic init 343and 344.Ic start 345commands. 346.It Fl v 347The 348.Fl v 349.Pq Dq verbose 350option can be used to request more detailed information. 351.It Fl V 352The 353.Fl V 354.Pq Dq Very verbose 355option can be used to request more detailed information than the 356.Fl v 357option provides. 358.It Fl w 359The 360.Fl w 361.Pq Dq wait 362option tells 363.Nm 364to wait for completion of commands which normally run in the background, such as 365.Ic init . 366.El 367.Sh COMMANDS IN DETAIL 368.Nm 369commands perform the following functions: 370.Pp 371.Bl -tag -width indent -compact 372.It Ic attach Ar plex volume Op Cm rename 373.It Xo 374.Ic attach Ar subdisk plex 375.Op Ar offset 376.Op Cm rename 377.Xc 378.Nm Ic attach 379inserts the specified plex or subdisk in a volume or plex. In the case of a 380subdisk, an offset in the plex may be specified. If it is not, the subdisk will 381be attached at the first possible location. After attaching a plex to a 382non-empty volume, 383.Nm 384reintegrates the plex. 385.Pp 386If the keyword 387.Cm rename 388is specified, 389.Nm 390renames the object (and in the case of a plex, any subordinate subdisks) to fit 391in with the default 392.Nm 393naming convention. To rename the object to any other name, use the 394.Ic rename 395command. 396.Pp 397A number of considerations apply to attaching subdisks: 398.Bl -bullet 399.It 400Subdisks can normally only be attached to concatenated plexes. 401.It 402If a striped or RAID-5 plex is missing a subdisk (for example after drive 403failure), it should be replaced by a subdisk of the same size only. 404.It 405In order to add further subdisks to a striped or RAID-5 plex, use the 406.Fl f 407(force) option. This will corrupt the data in the plex. 408.\"No other attachment of 409.\"subdisks is currently allowed for striped and RAID-5 plexes. 410.It 411For concatenated plexes, the 412.Ar offset 413parameter specifies the offset in blocks from the beginning of the plex. For 414striped and RAID-5 plexes, it specifies the offset of the first block of the 415subdisk: in other words, the offset is the numerical position of the subdisk 416multiplied by the stripe size. For example, in a plex with stripe size 271k, 417the first subdisk will have offset 0, the second offset 271k, the third 542k, 418etc. This calculation ignores parity blocks in RAID-5 plexes. 419.El 420.Pp 421.It Xo 422.Ic checkparity 423.Ar plex 424.Op Fl f 425.Op Fl v 426.Xc 427Check the parity blocks on the specified RAID-4 or RAID-5 plex. This operation 428maintains a pointer in the plex, so it can be stopped and later restarted from 429the same position if desired. In addition, this pointer is used by the 430.Ic rebuildparity 431command, so rebuilding the parity blocks need only start at the location where 432the first parity problem has been detected. 433.Pp 434If the 435.Fl f 436flag is specified, 437.Ic checkparity 438starts checking at the beginning of the plex. If the 439.Fl v 440flag is specified, 441.Ic checkparity 442prints a running progress report. 443.Pp 444.It Xo 445.Ic concat 446.Op Fl f 447.Op Fl n Ar name 448.Op Fl v 449.Ar drives 450.Xc 451The 452.Ic concat 453command provides a simplified alternative to the 454.Ic create 455command for creating volumes with a single concatenated plex. The largest 456contiguous space available on each drive is used to create the subdisks for the 457plexes. 458.Pp 459Normally, the 460.Ic concat 461command creates an arbitrary name for the volume and its components. The name 462is composed of the text 463.Dq Li vinum 464and a small integer, for example 465.Dq Li vinum3 . 466You can override this with the 467.Fl n Ar name 468option, which assigns the name specified to the volume. The plexes and subdisks 469are named after the volume in the default manner. 470.Pp 471There is no choice of name for the drives. If the drives have already been 472initialized as 473.Nm 474drives, the name remains. Otherwise the drives are given names starting with 475the text 476.Dq Li vinumdrive 477and a small integer, for example 478.Dq Li vinumdrive7 . 479As with the 480.Ic create 481command, the 482.Fl f 483option can be used to specify that a previous name should be overwritten. The 484.Fl v 485is used to specify verbose output. 486.Pp 487See the section 488.Sx SIMPLIFIED CONFIGURATION 489below for some examples of this 490command. 491.Pp 492.It Xo 493.Ic create 494.Op Fl f 495.Ar description-file 496.Xc 497.Nm Ic create 498is used to create any object. In view of the relatively complicated 499relationship and the potential dangers involved in creating a 500.Nm 501object, there is no interactive interface to this function. If you do not 502specify a file name, 503.Nm 504starts an editor on a temporary file. If the environment variable 505.Ev EDITOR 506is set, 507.Nm 508starts this editor. If not, it defaults to 509.Nm vi . 510See the section 511.Sx CONFIGURATION FILE 512below for more information on the format of 513this file. 514.Pp 515Note that the 516.Nm Ic create 517function is additive: if you run it multiple times, you will create multiple 518copies of all unnamed objects. 519.Pp 520Normally the 521.Ic create 522command will not change the names of existing 523.Nm 524drives, in order to avoid accidentally erasing them. The correct way to dispose 525of no longer wanted 526.Nm 527drives is to reset the configuration with the 528.Ic resetconfig 529command. In some cases, however, it may be necessary to create new data on 530.Nm 531drives which can no longer be started. In this case, use the 532.Ic create Fl f 533command. 534.Pp 535.It Ic debug 536.Nm Ic debug , 537without any arguments, is used to enter the remote kernel debugger. It is only 538activated if 539.Nm 540is built with the 541.Dv VINUMDEBUG 542option. This option will stop the execution of the operating system until the 543kernel debugger is exited. If remote debugging is set and there is no remote 544connection for a kernel debugger, it will be necessary to reset the system and 545reboot in order to leave the debugger. 546.Pp 547.It Ic debug Ar flags 548Set a bit mask of internal debugging flags. These will change without warning 549as the product matures; to be certain, read the header file 550.Aq Pa sys/dev/vinumvar.h . 551The bit mask is composed of the following values: 552.Bl -tag -width indent 553.It Dv DEBUG_ADDRESSES Pq No 1 554Show buffer information during requests 555.\".It Dv DEBUG_NUMOUTPUT Pq No 2 556.\"Show the value of 557.\".Va vp->v_numoutput . 558.It Dv DEBUG_RESID Pq No 4 559Go into debugger in 560.Fn complete_rqe . 561.It Dv DEBUG_LASTREQS Pq No 8 562Keep a circular buffer of last requests. 563.It Dv DEBUG_REVIVECONFLICT Pq No 16 564Print info about revive conflicts. 565.It Dv DEBUG_EOFINFO Pq No 32 566Print information about internal state when returning an 567.Dv EOF 568on a striped plex. 569.It Dv DEBUG_MEMFREE Pq No 64 570Maintain a circular list of the last memory areas freed by the memory allocator. 571.It Dv DEBUG_REMOTEGDB Pq No 256 572Go into remote 573.Nm gdb 574when the 575.Ic debug 576command is issued. 577.It Dv DEBUG_WARNINGS Pq No 512 578Print some warnings about minor problems in the implementation. 579.El 580.Pp 581.It Ic detach Oo Fl f Oc Ar plex 582.It Ic detach Oo Fl f Oc Ar subdisk 583.Nm Ic detach 584removes the specified plex or subdisk from the volume or plex to which it is 585attached. If removing the object would impair the data integrity of the volume, 586the operation will fail unless the 587.Fl f 588option is specified. If the object is named after the object above it (for 589example, subdisk 590.Li vol1.p7.s0 591attached to plex 592.Li vol1.p7 ) , 593the name will be changed 594by prepending the text 595.Dq Li ex- 596(for example, 597.Li ex-vol1.p7.s0 ) . 598If necessary, the name will be truncated in the 599process. 600.Pp 601.Ic detach 602does not reduce the number of subdisks in a striped or RAID-5 plex. Instead, 603the subdisk is marked absent, and can later be replaced with the 604.Ic attach 605command. 606.Pp 607.It Ic dumpconfig Op Ar drive ... 608.Pp 609.Nm Ic dumpconfig 610shows the configuration information stored on the specified drives. If no drive 611names are specified, 612.Ic dumpconfig 613searches all drives on the system for Vinum partitions and dumps the 614information. If configuration updates are disabled, it is possible that this 615information is not the same as the information returned by the 616.Ic list 617command. This command is used primarily for maintenance and debugging. 618.Pp 619.It Ic info 620.Nm Ic info 621displays information about 622.Nm 623memory usage. This is intended primarily for debugging. With the 624.Fl v 625option, it will give detailed information about the memory areas in use. 626.Pp 627With the 628.Fl V 629option, 630.Ic info 631displays information about the last up to 64 I/O requests handled by the 632.Nm 633driver. This information is only collected if debug flag 8 is set. The format 634looks like: 635.Bd -literal 636vinum -> info -V 637Flags: 0x200 1 opens 638Total of 38 blocks malloced, total memory: 16460 639Maximum allocs: 56, malloc table at 0xf0f72dbc 640 641Time Event Buf Dev Offset Bytes SD SDoff Doffset Goffset 642 64314:40:00.637758 1VS Write 0xf2361f40 91.3 0x10 16384 64414:40:00.639280 2LR Write 0xf2361f40 91.3 0x10 16384 64514:40:00.639294 3RQ Read 0xf2361f40 4.39 0x104109 8192 19 0 0 0 64614:40:00.639455 3RQ Read 0xf2361f40 4.23 0xd2109 8192 17 0 0 0 64714:40:00.639529 3RQ Read 0xf2361f40 4.15 0x6e109 8192 16 0 0 0 64814:40:00.652978 4DN Read 0xf2361f40 4.39 0x104109 8192 19 0 0 0 64914:40:00.667040 4DN Read 0xf2361f40 4.15 0x6e109 8192 16 0 0 0 65014:40:00.668556 4DN Read 0xf2361f40 4.23 0xd2109 8192 17 0 0 0 65114:40:00.669777 6RP Write 0xf2361f40 4.39 0x104109 8192 19 0 0 0 65214:40:00.685547 4DN Write 0xf2361f40 4.39 0x104109 8192 19 0 0 0 65311:11:14.975184 Lock 0xc2374210 2 0x1f8001 65411:11:15.018400 7VS Write 0xc2374210 0x7c0 32768 10 65511:11:15.018456 8LR Write 0xc2374210 13.39 0xcc0c9 32768 65611:11:15.046229 Unlock 0xc2374210 2 0x1f8001 657.Ed 658.Pp 659The 660.Ar Buf 661field always contains the address of the user buffer header. This can be used 662to identify the requests associated with a user request, though this is not 100% 663reliable: theoretically two requests in sequence could use the same buffer 664header, though this is not common. The beginning of a request can be identified 665by the event 666.Ar 1VS 667or 668.Ar 7VS . 669The first example above shows the requests involved in a user request. The 670second is a subdisk I/O request with locking. 671.Pp 672The 673.Ar Event 674field contains information related to the sequence of events in the request 675chain. The digit 676.Ar 1 677to 678.Ar 6 679indicates the approximate sequence of events, and the two-letter abbreviation is 680a mnemonic for the location: 681.Bl -tag -width Lockwait 682.It 1VS 683(vinumstrategy) shows information about the user request on entry to 684.Fn vinumstrategy . 685The device number is the 686.Nm 687device, and offset and length are the user parameters. This is always the 688beginning of a request sequence. 689.It 2LR 690(launch_requests) shows the user request just prior to launching the low-level 691.Nm 692requests in the function 693.Fn launch_requests . 694The parameters should be the same as in the 695.Ar 1VS 696information. 697.El 698.Pp 699In the following requests, 700.Ar Dev 701is the device number of the associated disk partition, 702.Ar Offset 703is the offset from the beginning of the partition, 704.Ar SD 705is the subdisk index in 706.Va vinum_conf , 707.Ar SDoff 708is the offset from the beginning of the subdisk, 709.Ar Doffset 710is the offset of the associated data request, and 711.Ar Goffset 712is the offset of the associated group request, where applicable. 713.Bl -tag -width Lockwait 714.It 3RQ 715(request) shows one of possibly several low-level 716.Nm 717requests which are launched to satisfy the high-level request. This information 718is also logged in 719.Fn launch_requests . 720.It 4DN 721(done) is called from 722.Fn complete_rqe , 723showing the completion of a request. This completion should match a request 724launched either at stage 725.Ar 4DN 726from 727.Fn launch_requests , 728or from 729.Fn complete_raid5_write 730at stage 731.Ar 5RD 732or 733.Ar 6RP . 734.It 5RD 735(RAID-5 data) is called from 736.Fn complete_raid5_write 737and represents the data written to a RAID-5 data stripe after calculating 738parity. 739.It 6RP 740(RAID-5 parity) is called from 741.Fn complete_raid5_write 742and represents the data written to a RAID-5 parity stripe after calculating 743parity. 744.It 7VS 745shows a subdisk I/O request. These requests are usually internal to 746.Nm 747for operations like initialization or rebuilding plexes. 748.It 8LR 749shows the low-level operation generated for a subdisk I/O request. 750.It Lockwait 751specifies that the process is waiting for a range lock. The parameters are the 752buffer header associated with the request, the plex number and the block number. 753For internal reasons the block number is one higher than the address of the 754beginning of the stripe. 755.It Lock 756specifies that a range lock has been obtained. The parameters are the same as 757for the range lock. 758.It Unlock 759specifies that a range lock has been released. The parameters are the same as 760for the range lock. 761.El 762.\" XXX 763.Pp 764.It Xo 765.Ic init 766.Op Fl S Ar size 767.Op Fl w 768.Ar plex | subdisk 769.Xc 770.Nm Ic init 771initializes a subdisk by writing zeroes to it. You can initialize all subdisks 772in a plex by specifying the plex name. This is the only way to ensure 773consistent data in a plex. You must perform this initialization before using a 774RAID-5 plex. It is also recommended for other new plexes. 775.Nm 776initializes all subdisks of a plex in parallel. Since this operation can take a 777long time, it is normally performed in the background. If you want to wait for 778completion of the command, use the 779.Fl w 780(wait) option. 781.Pp 782Specify the 783.Fl S 784option if you want to write blocks of a different size from the default value of 78516 kB. 786.Nm 787prints a console message when the initialization is complete. 788.Pp 789.It Ic label Ar volume 790The 791.Ic label 792command writes a 793.Em ufs 794style volume label on a volume. It is a simple alternative to an appropriate 795call to 796.Ic disklabel . 797This is needed because some 798.Em ufs 799commands still read the disk to find the label instead of using the correct 800.Xr ioctl 2 801call to access it. 802.Nm 803maintains a volume label separately from the volume data, so this command is not 804needed for 805.Xr newfs 8 . 806This command is deprecated. 807.Pp 808.It Xo 809.Ic list 810.Op Fl r 811.Op Fl V 812.Op Ar volume | plex | subdisk 813.Xc 814.It Xo 815.Ic l 816.Op Fl r 817.Op Fl V 818.Op Ar volume | plex | subdisk 819.Xc 820.It Xo 821.Ic ld 822.Op Fl r 823.Op Fl s 824.Op Fl v 825.Op Fl V 826.Op Ar volume 827.Xc 828.It Xo 829.Ic ls 830.Op Fl r 831.Op Fl s 832.Op Fl v 833.Op Fl V 834.Op Ar subdisk 835.Xc 836.It Xo 837.Ic lp 838.Op Fl r 839.Op Fl s 840.Op Fl v 841.Op Fl V 842.Op Ar plex 843.Xc 844.It Xo 845.Ic lv 846.Op Fl r 847.Op Fl s 848.Op Fl v 849.Op Fl V 850.Op Ar volume 851.Xc 852.Ic list 853is used to show information about the specified object. If the argument is 854omitted, information is shown about all objects known to 855.Nm . 856The 857.Ic l 858command is a synonym for 859.Ic list . 860.Pp 861The 862.Fl r 863option relates to volumes and plexes: if specified, it recursively lists 864information for the subdisks and (for a volume) plexes subordinate to the 865objects. The commands 866.Ic lv , lp , ls 867and 868.Ic ld 869list only volumes, plexes, subdisks and drives respectively. This is 870particularly useful when used without parameters. 871.Pp 872The 873.Fl s 874option causes 875.Nm 876to output device statistics, the 877.Fl v 878(verbose) option causes some additional information to be output, and the 879.Fl V 880causes considerable additional information to be output. 881.Pp 882.It Ic makedev 883The 884.Ic makedev 885command removes the directory 886.Pa /dev/vinum 887and recreates it with device nodes 888which reflect the current configuration. This command is not intended for 889general use, and is provided for emergency use only. 890.Pp 891.It Xo 892.Ic mirror 893.Op Fl f 894.Op Fl n Ar name 895.Op Fl s 896.Op Fl v 897.Ar drives 898.Xc 899The 900.Ic mirror 901command provides a simplified alternative to the 902.Ic create 903command for creating mirrored volumes. Without any options, it creates a RAID-1 904(mirrored) volume with two concatenated plexes. The largest contiguous space 905available on each drive is used to create the subdisks for the plexes. The 906first plex is built from the odd-numbered drives in the list, and the second 907plex is built from the even-numbered drives. If the drives are of different 908sizes, the plexes will be of different sizes. 909.Pp 910If the 911.Fl s 912option is provided, 913.Ic mirror 914builds striped plexes with a stripe size of 256 kB. The size of the subdisks in 915each plex is the size of the smallest contiguous storage available on any of the 916drives which form the plex. Again, the plexes may differ in size. 917.Pp 918Normally, the 919.Ic mirror 920command creates an arbitrary name for the volume and its components. The name 921is composed of the text 922.Dq Li vinum 923and a small integer, for example 924.Dq Li vinum3 . 925You can override this with the 926.Fl n Ar name 927option, which assigns the name specified to the volume. The plexes and subdisks 928are named after the volume in the default manner. 929.Pp 930There is no choice of name for the drives. If the drives have already been 931initialized as 932.Nm 933drives, the name remains. Otherwise the drives are given names starting with 934the text 935.Dq Li vinumdrive 936and a small integer, for example 937.Dq Li vinumdrive7 . 938As with the 939.Ic create 940command, the 941.Fl f 942option can be used to specify that a previous name should be overwritten. The 943.Fl v 944is used to specify verbose output. 945.Pp 946See the section 947.Sx SIMPLIFIED CONFIGURATION 948below for some examples of this 949command. 950.Pp 951.It Ic mv Fl f Ar drive object ... 952.It Ic move Fl f Ar drive object ... 953Move all the subdisks from the specified objects onto the new drive. The 954objects may be subdisks, drives or plexes. When drives or plexes are specified, 955all subdisks associated with the object are moved. 956.Pp 957The 958.Fl f 959option is required for this function, since it currently does not preserve the 960data in the subdisk. This functionality will be added at a later date. In this 961form, however, it is suited to recovering a failed disk drive. 962.Pp 963.It Ic printconfig Op Ar file 964Write a copy of the current configuration to 965.Ar file 966in a format that can be used to recreate the 967.Nm 968configuration. Unlike the configuration saved on disk, it includes definitions 969of the drives. If you omit 970.Ar file , 971.Nm 972writes the list to 973.Dv stdout . 974.Pp 975.It Ic quit 976Exit the 977.Nm 978program when running in interactive mode. Normally this would be done by 979entering the 980.Dv EOF 981character. 982.Pp 983.It Ic read Ar disk ... 984The 985.Ic read 986command scans the specified disks for 987.Nm 988partitions containing previously created configuration information. It reads 989the configuration in order from the most recently updated to least recently 990updated configuration. 991.Nm 992maintains an up-to-date copy of all configuration information on each disk 993partition. You must specify all of the slices in a configuration as the 994parameter to this command. 995.Pp 996The 997.Ic read 998command is intended to selectively load a 999.Nm 1000configuration on a system which has other 1001.Nm 1002partitions. If you want to start all partitions on the system, it is easier to 1003use the 1004.Ic start 1005command. 1006.Pp 1007If 1008.Nm 1009encounters any errors during this command, it will turn off automatic 1010configuration update to avoid corrupting the copies on disk. This will also 1011happen if the configuration on disk indicates a configuration error (for 1012example, subdisks which do not have a valid space specification). You can turn 1013the updates on again with the 1014.Ic setdaemon 1015and 1016.Ic saveconfig 1017commands. Reset bit 2 (numerical value 4) of the daemon options mask to 1018re-enable configuration saves. 1019.Pp 1020.It Xo 1021.Ic rebuildparity 1022.Ar plex 1023.Op Fl f 1024.Op Fl v 1025.Op Fl V 1026.Xc 1027Rebuild the parity blocks on the specified RAID-4 or RAID-5 plex. This 1028operation maintains a pointer in the plex, so it can be stopped and later 1029restarted from the same position if desired. In addition, this pointer is used 1030by the 1031.Ic checkparity 1032command, so rebuilding the parity blocks need only start at the location where 1033the first parity problem has been detected. 1034.Pp 1035If the 1036.Fl f 1037flag is specified, 1038.Ic rebuildparity 1039starts rebuilding at the beginning of the plex. If the 1040.Fl v 1041flag is specified, 1042.Ic rebuildparity 1043first checks the existing parity blocks prints information about those found to 1044be incorrect before rebuilding. If the 1045.Fl V 1046flag is specified, 1047.Ic rebuildparity 1048prints a running progress report. 1049.Pp 1050.It Xo 1051.Ic rename 1052.Op Fl r 1053.Op Ar drive | subdisk | plex | volume 1054.Ar newname 1055.Xc 1056Change the name of the specified object. If the 1057.Fl r 1058option is specified, subordinate objects will be named by the default rules: 1059plex names will be formed by appending 1060.Li .p Ns Ar number 1061to the volume name, and 1062subdisk names will be formed by appending 1063.Li .s Ns Ar number 1064to the plex name. 1065.\".Pp 1066.\".It Xo 1067.\".Ic replace 1068.\".Ar drive newdrive 1069.\"Move all the subdisks from the specified drive onto the new drive. This will 1070.\"attempt to recover those subdisks that can be recovered, and create the others 1071.\"from scratch. If the new drive lacks the space for this operation, as many 1072.\"subdisks as possible will be fitted onto the drive, and the rest will be left on 1073.\"the original drive. 1074.Pp 1075.It Ic resetconfig 1076The 1077.Ic resetconfig 1078command completely obliterates the 1079.Nm 1080configuration on a system. Use this command only when you want to completely 1081delete the configuration. 1082.Nm 1083will ask for confirmation; you must type in the words 1084.Li "NO FUTURE" 1085exactly as shown: 1086.Bd -unfilled -offset indent 1087.No # Nm Ic resetconfig 1088 1089WARNING! This command will completely wipe out your vinum 1090configuration. All data will be lost. If you really want 1091to do this, enter the text 1092 1093NO FUTURE 1094.No "Enter text ->" Sy "NO FUTURE" 1095Vinum configuration obliterated 1096.Ed 1097.Pp 1098As the message suggests, this is a last-ditch command. Don't use it unless you 1099have an existing configuration which you never want to see again. 1100.Pp 1101.It Xo 1102.Ic resetstats 1103.Op Fl r 1104.Op Ar volume | plex | subdisk 1105.Xc 1106.Nm 1107maintains a number of statistical counters for each object. See the header file 1108.Aq Pa sys/dev/vinumvar.h 1109for more information. 1110.\" XXX put it in here when it's finalized 1111Use the 1112.Ic resetstats 1113command to reset these counters. In conjunction with the 1114.Fl r 1115option, 1116.Nm 1117also resets the counters of subordinate objects. 1118.Pp 1119.It Xo 1120.Ic rm 1121.Op Fl f 1122.Op Fl r 1123.Ar volume | plex | subdisk 1124.Xc 1125.Ic rm 1126removes an object from the 1127.Nm 1128configuration. Once an object has been removed, there is no way to recover it. 1129Normally 1130.Nm 1131performs a large amount of consistency checking before removing an object. The 1132.Fl f 1133option tells 1134.Nm 1135to omit this checking and remove the object anyway. Use this option with great 1136care: it can result in total loss of data on a volume. 1137.Pp 1138Normally, 1139.Nm 1140refuses to remove a volume or plex if it has subordinate plexes or subdisks 1141respectively. You can tell 1142.Nm 1143to remove the object anyway by using the 1144.Fl f 1145option, or you can cause 1146.Nm 1147to remove the subordinate objects as well by using the 1148.Fl r 1149(recursive) option. If you remove a volume with the 1150.Fl r 1151option, it will remove both the plexes and the subdisks which belong to the 1152plexes. 1153.Pp 1154.It Ic saveconfig 1155Save the current configuration to disk. Normally this is not necessary, since 1156.Nm 1157automatically saves any change in configuration. If an error occurs on startup, 1158updates will be disabled. When you reenable them with the 1159.Ic setdaemon 1160command, 1161.Nm 1162does not automatically save the configuration to disk. Use this command to save 1163the configuration. 1164.\".Pp 1165.\".It Xo 1166.\".Ic set 1167.\".Op Fl f 1168.\".Ar state 1169.\".Ar volume | plex | subdisk | disk 1170.\".Xc 1171.\".Ic set 1172.\"sets the state of the specified object to one of the valid states (see 1173.\".Sx OBJECT STATES 1174.\"below). Normally 1175.\".Nm 1176.\"performs a large amount of consistency checking before making the change. The 1177.\".Fl f 1178.\"option tells 1179.\".Nm 1180.\"to omit this checking and perform the change anyway. Use this option with great 1181.\"care: it can result in total loss of data on a volume. 1182.Pp 1183.It Ic setdaemon Op Ar value 1184.Ic setdaemon 1185sets a variable bitmask for the 1186.Nm 1187daemon. This command is temporary and will be replaced. Currently, the bit mask 1188may contain the bits 1 (log every action to syslog) and 4 (don't update 1189configuration). Option bit 4 can be useful for error recovery. 1190.Pp 1191.It Xo 1192.Ic setstate Ar state 1193.Op Ar volume | plex | subdisk | drive 1194.Xc 1195.Ic setstate 1196sets the state of the specified objects to the specified state. This bypasses 1197the usual consistency mechanism of 1198.Nm 1199and should be used only for recovery purposes. It is possible to crash the 1200system by incorrect use of this command. 1201.Pp 1202.It Xo 1203.Ic start 1204.Op Fl i Ar interval 1205.Op Fl S Ar size 1206.Op Fl w 1207.Op Ar plex | subdisk 1208.Xc 1209.Ic start 1210starts (brings into to the 1211.Em up 1212state) one or more 1213.Nm 1214objects. 1215.Pp 1216If no object names are specified, 1217.Nm 1218scans the disks known to the system for 1219.Nm 1220drives and then reads in the configuration as described under the 1221.Ic read 1222commands. The 1223.Nm 1224drive contains a header with all information about the data stored on the drive, 1225including the names of the other drives which are required in order to represent 1226plexes and volumes. 1227.Pp 1228If 1229.Nm 1230encounters any errors during this command, it will turn off automatic 1231configuration update to avoid corrupting the copies on disk. This will also 1232happen if the configuration on disk indicates a configuration error (for 1233example, subdisks which do not have a valid space specification). You can turn 1234the updates on again with the 1235.Ic setdaemon 1236and 1237.Ic saveconfig 1238command. Reset bit 4 of the daemon options mask to re-enable configuration 1239saves. 1240.Pp 1241If object names are specified, 1242.Nm 1243starts them. Normally this operation is only of use with subdisks. The action 1244depends on the current state of the object: 1245.Bl -bullet 1246.It 1247If the object is already in the 1248.Em up 1249state, 1250.Nm 1251does nothing. 1252.It 1253If the object is a subdisk in the 1254.Em down 1255or 1256.Em reborn 1257states, 1258.Nm 1259changes it to the 1260.Em up 1261state. 1262.It 1263If the object is a subdisk in the 1264.Em empty 1265state, the change depends on the subdisk. If it is part of a plex which is part 1266of a volume which contains other plexes, 1267.Nm 1268places the subdisk in the 1269.Em reviving 1270state and attempts to copy the data from the volume. When the operation 1271completes, the subdisk is set into the 1272.Em up 1273state. If it is part of a plex which is part of a volume which contains no 1274other plexes, or if it is not part of a plex, 1275.Nm 1276brings it into the 1277.Em up 1278state immediately. 1279.It 1280If the object is a subdisk in the 1281.Em reviving 1282state, 1283.Nm 1284continues the revive 1285operation offline. When the operation completes, the subdisk is set into the 1286.Em up 1287state. 1288.El 1289.Pp 1290When a subdisk comes into the 1291.Em up 1292state, 1293.Nm 1294automatically checks the state of any plex and volume to which it may belong and 1295changes their state where appropriate. 1296.Pp 1297If the object is a plex, 1298.Ic start 1299checks the state of the subordinate subdisks (and plexes in the case of a 1300volume) and starts any subdisks which can be started. 1301.Pp 1302To start a plex in a multi-plex volume, the data must be copied from another 1303plex in the volume. Since this frequently takes a long time, it is normally 1304done in the background. If you want to wait for this operation to complete (for 1305example, if you are performing this operation in a script), use the 1306.Fl w 1307option. 1308.Pp 1309Copying data doesn't just take a long time, it can also place a significant load 1310on the system. You can specify the transfer size in bytes or sectors with the 1311.Fl S 1312option, and an interval (in milliseconds) to wait between copying each block with 1313the 1314.Fl i 1315option. Both of these options lessen the load on the system. 1316.Pp 1317.It Xo 1318.Ic stop 1319.Op Fl f 1320.Op Ar volume | plex | subdisk 1321.Xc 1322If no parameters are specified, 1323.Ic stop 1324removes the 1325.Nm 1326KLD and stops 1327.Xr vinum 4 . 1328This can only be done if no objects are active. In particular, the 1329.Fl f 1330option does not override this requirement. Normally, the 1331.Ic stop 1332command writes the current configuration back to the drives before terminating. 1333This will not be possible if configuration updates are disabled, so 1334.Nm 1335will not stop if configuration updates are disabled. You can override this by 1336specifying the 1337.Fl f 1338option. 1339.Pp 1340The 1341.Ic stop 1342command can only work if 1343.Nm 1344has been loaded as a KLD, since it is not possible to unload a statically 1345configured driver. 1346.Nm Ic stop 1347will fail if 1348.Nm 1349is statically configured. 1350.Pp 1351If object names are specified, 1352.Ic stop 1353disables access to the objects. If the objects have subordinate objects, they 1354subordinate objects must either already be inactive (stopped or in error), or 1355the 1356.Fl r 1357and 1358.Fl f 1359options must be specified. This command does not remove the objects from the 1360configuration. They can be accessed again after a 1361.Ic start 1362command. 1363.Pp 1364By default, 1365.Nm 1366does not stop active objects. For example, you cannot stop a plex which is 1367attached to an active volume, and you cannot stop a volume which is open. The 1368.Fl f 1369option tells 1370.Nm 1371to omit this checking and remove the object anyway. Use this option with great 1372care and understanding: used incorrectly, it can result in serious data 1373corruption. 1374.Pp 1375.It Xo 1376.Ic stripe 1377.Op Fl f 1378.Op Fl n Ar name 1379.Op Fl v 1380.Ar drives 1381.Xc 1382The 1383.Ic stripe 1384command provides a simplified alternative to the 1385.Ic create 1386command for creating volumes with a single striped plex. The size of the 1387subdisks is the size of the largest contiguous space available on all the 1388specified drives. The stripe size is fixed at 256 kB. 1389.Pp 1390Normally, the 1391.Ic stripe 1392command creates an arbitrary name for the volume and its components. The name 1393is composed of the text 1394.Dq Li vinum 1395and a small integer, for example 1396.Dq Li vinum3 . 1397You can override this with the 1398.Fl n Ar name 1399option, which assigns the name specified to the volume. The plexes and subdisks 1400are named after the volume in the default manner. 1401.Pp 1402There is no choice of name for the drives. If the drives have already been 1403initialized as 1404.Nm 1405drives, the name remains. Otherwise the drives are given names starting with 1406the text 1407.Dq Li vinumdrive 1408and a small integer, for example 1409.Dq Li vinumdrive7 . 1410As with the 1411.Ic create 1412command, the 1413.Fl f 1414option can be used to specify that a previous name should be overwritten. The 1415.Fl v 1416is used to specify verbose output. 1417.Pp 1418See the section 1419.Sx SIMPLIFIED CONFIGURATION 1420below for some examples of this 1421command. 1422.El 1423.Sh SIMPLIFIED CONFIGURATION 1424This section describes a simplified interface to 1425.Nm 1426configuration using the 1427.Ic concat , 1428.Ic mirror 1429and 1430.Ic stripe 1431commands. These commands create convenient configurations for some more normal 1432situations, but they are not as flexible as the 1433.Ic create 1434command. 1435.Pp 1436See above for the description of the commands. Here are some examples, all 1437performed with the same collection of disks. Note that the first drive, 1438.Pa /dev/da1h , 1439is smaller than the others. This has an effect on the sizes chosen for each 1440kind of subdisk. 1441.Pp 1442The following examples all use the 1443.Fl v 1444option to show the commands passed to the system, and also to list the structure 1445of the volume. Without the 1446.Fl v 1447option, these commands produce no output. 1448.Ss Volume with a single concatenated plex 1449Use a volume with a single concatenated plex for the largest possible storage 1450without resilience to drive failures: 1451.Bd -literal 1452vinum -> concat -v /dev/da1h /dev/da2h /dev/da3h /dev/da4h 1453volume vinum0 1454 plex name vinum0.p0 org concat 1455drive vinumdrive0 device /dev/da1h 1456 sd name vinum0.p0.s0 drive vinumdrive0 size 0 1457drive vinumdrive1 device /dev/da2h 1458 sd name vinum0.p0.s1 drive vinumdrive1 size 0 1459drive vinumdrive2 device /dev/da3h 1460 sd name vinum0.p0.s2 drive vinumdrive2 size 0 1461drive vinumdrive3 device /dev/da4h 1462 sd name vinum0.p0.s3 drive vinumdrive3 size 0 1463V vinum0 State: up Plexes: 1 Size: 2134 MB 1464P vinum0.p0 C State: up Subdisks: 4 Size: 2134 MB 1465S vinum0.p0.s0 State: up PO: 0 B Size: 414 MB 1466S vinum0.p0.s1 State: up PO: 414 MB Size: 573 MB 1467S vinum0.p0.s2 State: up PO: 988 MB Size: 573 MB 1468S vinum0.p0.s3 State: up PO: 1561 MB Size: 573 MB 1469.Ed 1470.Pp 1471In this case, the complete space on all four disks was used, giving a volume 14722134 MB in size. 1473.Ss Volume with a single striped plex 1474A volume with a single striped plex may give better performance than a 1475concatenated plex, but restrictions on striped plexes can mean that the volume 1476is smaller. It will also not be resilient to a drive failure: 1477.Bd -literal 1478vinum -> stripe -v /dev/da1h /dev/da2h /dev/da3h /dev/da4h 1479drive vinumdrive0 device /dev/da1h 1480drive vinumdrive1 device /dev/da2h 1481drive vinumdrive2 device /dev/da3h 1482drive vinumdrive3 device /dev/da4h 1483volume vinum0 1484 plex name vinum0.p0 org striped 256k 1485 sd name vinum0.p0.s0 drive vinumdrive0 size 849825b 1486 sd name vinum0.p0.s1 drive vinumdrive1 size 849825b 1487 sd name vinum0.p0.s2 drive vinumdrive2 size 849825b 1488 sd name vinum0.p0.s3 drive vinumdrive3 size 849825b 1489V vinum0 State: up Plexes: 1 Size: 1659 MB 1490P vinum0.p0 S State: up Subdisks: 4 Size: 1659 MB 1491S vinum0.p0.s0 State: up PO: 0 B Size: 414 MB 1492S vinum0.p0.s1 State: up PO: 256 kB Size: 414 MB 1493S vinum0.p0.s2 State: up PO: 512 kB Size: 414 MB 1494S vinum0.p0.s3 State: up PO: 768 kB Size: 414 MB 1495.Ed 1496.Pp 1497In this case, the size of the subdisks has been limited to the smallest 1498available disk, so the resulting volume is only 1659 MB in size. 1499.Ss Mirrored volume with two concatenated plexes 1500For more reliability, use a mirrored, concatenated volume: 1501.Bd -literal 1502vinum -> mirror -v -n mirror /dev/da1h /dev/da2h /dev/da3h /dev/da4h 1503drive vinumdrive0 device /dev/da1h 1504drive vinumdrive1 device /dev/da2h 1505drive vinumdrive2 device /dev/da3h 1506drive vinumdrive3 device /dev/da4h 1507volume mirror setupstate 1508 plex name mirror.p0 org concat 1509 sd name mirror.p0.s0 drive vinumdrive0 size 0b 1510 sd name mirror.p0.s1 drive vinumdrive2 size 0b 1511 plex name mirror.p1 org concat 1512 sd name mirror.p1.s0 drive vinumdrive1 size 0b 1513 sd name mirror.p1.s1 drive vinumdrive3 size 0b 1514V mirror State: up Plexes: 2 Size: 1146 MB 1515P mirror.p0 C State: up Subdisks: 2 Size: 988 MB 1516P mirror.p1 C State: up Subdisks: 2 Size: 1146 MB 1517S mirror.p0.s0 State: up PO: 0 B Size: 414 MB 1518S mirror.p0.s1 State: up PO: 414 MB Size: 573 MB 1519S mirror.p1.s0 State: up PO: 0 B Size: 573 MB 1520S mirror.p1.s1 State: up PO: 573 MB Size: 573 MB 1521.Ed 1522.Pp 1523This example specifies the name of the volume, 1524.Ar mirror . 1525Since one drive is smaller than the others, the two plexes are of different 1526size, and the last 158 MB of the volume is non-resilient. To ensure complete 1527reliability in such a situation, use the 1528.Ic create 1529command to create a volume with 988 MB. 1530.Ss Mirrored volume with two striped plexes 1531Alternatively, use the 1532.Fl s 1533option to create a mirrored volume with two striped plexes: 1534.Bd -literal 1535vinum -> mirror -v -n raid10 -s /dev/da1h /dev/da2h /dev/da3h /dev/da4h 1536drive vinumdrive0 device /dev/da1h 1537drive vinumdrive1 device /dev/da2h 1538drive vinumdrive2 device /dev/da3h 1539drive vinumdrive3 device /dev/da4h 1540volume raid10 setupstate 1541 plex name raid10.p0 org striped 256k 1542 sd name raid10.p0.s0 drive vinumdrive0 size 849825b 1543 sd name raid10.p0.s1 drive vinumdrive2 size 849825b 1544 plex name raid10.p1 org striped 256k 1545 sd name raid10.p1.s0 drive vinumdrive1 size 1173665b 1546 sd name raid10.p1.s1 drive vinumdrive3 size 1173665b 1547V raid10 State: up Plexes: 2 Size: 1146 MB 1548P raid10.p0 S State: up Subdisks: 2 Size: 829 MB 1549P raid10.p1 S State: up Subdisks: 2 Size: 1146 MB 1550S raid10.p0.s0 State: up PO: 0 B Size: 414 MB 1551S raid10.p0.s1 State: up PO: 256 kB Size: 414 MB 1552S raid10.p1.s0 State: up PO: 0 B Size: 573 MB 1553S raid10.p1.s1 State: up PO: 256 kB Size: 573 MB 1554.Ed 1555.Pp 1556In this case, the usable part of the volume is even smaller, since the first 1557plex has shrunken to match the smallest drive. 1558.Sh CONFIGURATION FILE 1559.Nm 1560requires that all parameters to the 1561.Ic create 1562commands must be in a configuration file. Entries in the configuration file 1563define volumes, plexes and subdisks, and may be in free format, except that each 1564entry must be on a single line. 1565.Ss Scale factors 1566Some configuration file parameters specify a size (lengths, stripe sizes). 1567These values can be specified as bytes, or one of the following scale factors 1568may be appended: 1569.Bl -tag -width indent 1570.It s 1571specifies that the value is a number of sectors of 512 bytes. 1572.It k 1573specifies that the value is a number of kilobytes (1024 bytes). 1574.It m 1575specifies that the value is a number of megabytes (1048576 bytes). 1576.It g 1577specifies that the value is a number of gigabytes (1073741824 bytes). 1578.It b 1579is used for compatibility with 1580.Tn VERITAS . 1581It stands for blocks of 512 bytes. 1582This abbreviation is confusing, since the word 1583.Dq block 1584is used in different 1585meanings, and its use is deprecated. 1586.El 1587.Pp 1588For example, the value 16777216 bytes can also be written as 1589.Em 16m , 1590.Em 16384k 1591or 1592.Em 32768s . 1593.Pp 1594The configuration file can contain the following entries: 1595.Bl -tag -width 4n 1596.It Ic drive Ar name devicename Op Ar options 1597Define a drive. The options are: 1598.Bl -tag -width 18n 1599.It Cm device Ar devicename 1600Specify the device on which the drive resides. 1601.Ar devicename 1602must be the name of a disk partition, for example 1603.Pa /dev/da1e 1604or 1605.Pa /dev/ad3s2h , 1606and it must be of type 1607.Em vinum . 1608Do not use the 1609.Dq Li c 1610partition, which is reserved for the complete disk. 1611.It Cm hotspare 1612Define the drive to be a 1613.Dq hot spare 1614drive, which is maintained to automatically replace a failed drive. 1615.Nm 1616does not allow this drive to be used for any other purpose. In particular, it 1617is not possible to create subdisks on it. This functionality has not been 1618completely implemented. 1619.El 1620.It Ic volume Ar name Op Ar options 1621Define a volume with name 1622.Ar name . 1623Options are: 1624.Bl -tag -width 18n 1625.It Cm plex Ar plexname 1626Add the specified plex to the volume. If 1627.Ar plexname 1628is specified as 1629.Cm * , 1630.Nm 1631will look for the definition of the plex as the next possible entry in the 1632configuration file after the definition of the volume. 1633.It Cm readpol Ar policy 1634Define a 1635.Em read policy 1636for the volume. 1637.Ar policy 1638may be either 1639.Cm round 1640or 1641.Cm prefer Ar plexname . 1642.Nm 1643satisfies a read request from only one of the plexes. A 1644.Cm round 1645read policy specifies that each read should be performed from a different plex 1646in 1647.Em round-robin 1648fashion. A 1649.Cm prefer 1650read policy reads from the specified plex every time. 1651.It Cm setupstate 1652When creating a multi-plex volume, assume that the contents of all the plexes 1653are consistent. This is normally not the case, so by default 1654.Nm 1655sets all plexes except the first one to the 1656.Em faulty 1657state. Use the 1658.Ic start 1659command to first bring them to a consistent state. In the case of striped and 1660concatenated plexes, however, it does not normally cause problems to leave them 1661inconsistent: when using a volume for a file system or a swap partition, the 1662previous contents of the disks are not of interest, so they may be ignored. 1663If you want to take this risk, use the 1664.Cm setupstate 1665keyword. It will only apply to the plexes defined immediately after the volume 1666in the configuration file. If you add plexes to a volume at a later time, you 1667must integrate them manually with the 1668.Ic start 1669command. 1670.Pp 1671Note that you 1672.Em must 1673use the 1674.Ic init 1675command with RAID-5 plexes: otherwise extreme data corruption will result if one 1676subdisk fails. 1677.El 1678.It Ic plex Op Ar options 1679Define a plex. Unlike a volume, a plex does not need a name. The options may 1680be: 1681.Bl -tag -width 18n 1682.It Cm name Ar plexname 1683Specify the name of the plex. Note that you must use the keyword 1684.Cm name 1685when naming a plex or subdisk. 1686.It Cm org Ar organization Op Ar stripesize 1687Specify the organization of the plex. 1688.Ar organization 1689can be one of 1690.Cm concat , striped 1691or 1692.Cm raid5 . 1693For 1694.Cm striped 1695and 1696.Cm raid5 1697plexes, the parameter 1698.Ar stripesize 1699must be specified, while for 1700.Cm concat 1701it must be omitted. For type 1702.Cm striped , 1703it specifies the width of each stripe. For type 1704.Cm raid5 , 1705it specifies the size of a group. A group is a portion of a plex which 1706stores the parity bits all in the same subdisk. It must be a factor of the plex size (in 1707other words, the result of dividing the plex size by the stripe size must be an 1708integer), and it must be a multiple of a disk sector (512 bytes). 1709.Pp 1710For optimum performance, stripes should be at least 128 kB in size: anything 1711smaller will result in a significant increase in I/O activity due to mapping of 1712individual requests over multiple disks. The performance improvement due to the 1713increased number of concurrent transfers caused by this mapping will not make up 1714for the performance drop due to the increase in latency. A good guideline for 1715stripe size is between 256 kB and 512 kB. Avoid powers of 2, however: they tend 1716to cause all superblocks to be placed on the first subdisk. 1717.Pp 1718A striped plex must have at least two subdisks (otherwise it is a concatenated 1719plex), and each must be the same size. A RAID-5 plex must have at least three 1720subdisks, and each must be the same size. In practice, a RAID-5 plex should 1721have at least 5 subdisks. 1722.It Cm volume Ar volname 1723Add the plex to the specified volume. If no 1724.Cm volume 1725keyword is specified, the plex will be added to the last volume mentioned in the 1726configuration file. 1727.It Cm sd Ar sdname offset 1728Add the specified subdisk to the plex at offset 1729.Ar offset . 1730.El 1731.It Ic subdisk Op Ar options 1732Define a subdisk. Options may be: 1733.Bl -hang -width 18n 1734.It Cm name Ar name 1735Specify the name of a subdisk. It is not necessary to specify a name for a 1736subdisk, see 1737.Sx OBJECT NAMING 1738above. Note that you must specify the keyword 1739.Cm name 1740if you wish to name a subdisk. 1741.It Cm plexoffset Ar offset 1742Specify the starting offset of the subdisk in the plex. If not specified, 1743.Nm 1744allocates the space immediately after the previous subdisk, if any, or otherwise 1745at the beginning of the plex. 1746.It Cm driveoffset Ar offset 1747Specify the starting offset of the subdisk in the drive. If not specified, 1748.Nm 1749allocates the first contiguous 1750.Ar length 1751bytes of free space on the drive. 1752.It Cm length Ar length 1753Specify the length of the subdisk. This keyword must be specified. There is no 1754default, but the value 0 may be specified to mean 1755.Dq "use the largest available contiguous free area on the drive" . 1756If the drive is empty, this means that the entire drive will be used for the 1757subdisk. 1758.Cm length 1759may be shortened to 1760.Cm len . 1761.It Cm plex Ar plex 1762Specify the plex to which the subdisk belongs. By default, the subdisk belongs 1763to the last plex specified. 1764.It Cm drive Ar drive 1765Specify the drive on which the subdisk resides. By default, the subdisk resides 1766on the last drive specified. 1767.El 1768.El 1769.Sh EXAMPLE CONFIGURATION FILE 1770.Bd -literal 1771# Sample vinum configuration file 1772# 1773# Our drives 1774drive drive1 device /dev/da1h 1775drive drive2 device /dev/da2h 1776drive drive3 device /dev/da3h 1777drive drive4 device /dev/da4h 1778drive drive5 device /dev/da5h 1779drive drive6 device /dev/da6h 1780# A volume with one striped plex 1781volume tinyvol 1782 plex org striped 512b 1783 sd length 64m drive drive2 1784 sd length 64m drive drive4 1785volume stripe 1786 plex org striped 512b 1787 sd length 512m drive drive2 1788 sd length 512m drive drive4 1789# Two plexes 1790volume concat 1791 plex org concat 1792 sd length 100m drive drive2 1793 sd length 50m drive drive4 1794 plex org concat 1795 sd length 150m drive drive4 1796# A volume with one striped plex and one concatenated plex 1797volume strcon 1798 plex org striped 512b 1799 sd length 100m drive drive2 1800 sd length 100m drive drive4 1801 plex org concat 1802 sd length 150m drive drive2 1803 sd length 50m drive drive4 1804# a volume with a RAID-5 and a striped plex 1805# note that the RAID-5 volume is longer by 1806# the length of one subdisk 1807volume vol5 1808 plex org striped 64k 1809 sd length 1000m drive drive2 1810 sd length 1000m drive drive4 1811 plex org raid5 32k 1812 sd length 500m drive drive1 1813 sd length 500m drive drive2 1814 sd length 500m drive drive3 1815 sd length 500m drive drive4 1816 sd length 500m drive drive5 1817.Ed 1818.Sh DRIVE LAYOUT CONSIDERATIONS 1819.Nm 1820drives are currently 1821.Bx 1822disk partitions. They must be of type 1823.Em vinum 1824in order to avoid overwriting data used for other purposes. Use 1825.Nm disklabel Fl e 1826to edit a partition type definition. The following display shows a typical 1827partition layout as shown by 1828.Xr disklabel 8 : 1829.Bd -literal 18308 partitions: 1831# size offset fstype [fsize bsize bps/cpg] 1832 a: 81920 344064 4.2BSD 0 0 0 # (Cyl. 240*- 297*) 1833 b: 262144 81920 swap # (Cyl. 57*- 240*) 1834 c: 4226725 0 unused 0 0 # (Cyl. 0 - 2955*) 1835 e: 81920 0 4.2BSD 0 0 0 # (Cyl. 0 - 57*) 1836 f: 1900000 425984 4.2BSD 0 0 0 # (Cyl. 297*- 1626*) 1837 g: 1900741 2325984 vinum 0 0 0 # (Cyl. 1626*- 2955*) 1838.Ed 1839.Pp 1840In this example, partition 1841.Dq Li g 1842may be used as a 1843.Nm 1844partition. Partitions 1845.Dq Li a , 1846.Dq Li e 1847and 1848.Dq Li f 1849may be used as 1850.Em UFS 1851file systems or 1852.Em ccd 1853partitions. Partition 1854.Dq Li b 1855is a swap partition, and partition 1856.Dq Li c 1857represents the whole disk and should not be used for any other purpose. 1858.Pp 1859.Nm 1860uses the first 265 sectors on each partition for configuration information, so 1861the maximum size of a subdisk is 265 sectors smaller than the drive. 1862.Sh LOG FILE 1863.Nm 1864maintains a log file, by default 1865.Pa /var/tmp/vinum_history , 1866in which it keeps track of the commands issued to 1867.Nm . 1868You can override the name of this file by setting the environment variable 1869.Ev VINUM_HISTORY 1870to the name of the file. 1871.Pp 1872Each message in the log file is preceded by a date. The default format is 1873.Qq Li %e %b %Y %H:%M:%S . 1874See 1875.Xr strftime 3 1876for further details of the format string. It can be overridden by the 1877environment variable 1878.Ev VINUM_DATEFORMAT . 1879.Sh HOW TO SET UP VINUM 1880This section gives practical advice about how to implement a 1881.Nm 1882system. 1883.Ss Where to put the data 1884The first choice you need to make is where to put the data. You need dedicated 1885disk partitions for 1886.Nm . 1887They should be partitions, not devices, and they should not be partition 1888.Dq Li c . 1889For example, good names are 1890.Pa /dev/da0e 1891or 1892.Pa /dev/ad3s4a . 1893Bad names are 1894.Pa /dev/da0 1895and 1896.Pa /dev/da0s1 , 1897both of which represent a device, not a partition, and 1898.Pa /dev/ad1c , 1899which represents a complete disk and should be of type 1900.Em unused . 1901See the example under 1902.Sx DRIVE LAYOUT CONSIDERATIONS 1903above. 1904.Ss Designing volumes 1905The way you set up 1906.Nm 1907volumes depends on your intentions. There are a number of possibilities: 1908.Bl -enum 1909.It 1910You may want to join up a number of small disks to make a reasonable sized file 1911system. For example, if you had five small drives and wanted to use all the 1912space for a single volume, you might write a configuration file like: 1913.Bd -literal -offset indent 1914drive d1 device /dev/da2e 1915drive d2 device /dev/da3e 1916drive d3 device /dev/da4e 1917drive d4 device /dev/da5e 1918drive d5 device /dev/da6e 1919volume bigger 1920 plex org concat 1921 sd length 0 drive d1 1922 sd length 0 drive d2 1923 sd length 0 drive d3 1924 sd length 0 drive d4 1925 sd length 0 drive d5 1926.Ed 1927.Pp 1928In this case, you specify the length of the subdisks as 0, which means 1929.Dq "use the largest area of free space that you can find on the drive" . 1930If the subdisk is the only subdisk on the drive, it will use all available 1931space. 1932.It 1933You want to set up 1934.Nm 1935to obtain additional resilience against disk failures. You have the choice of 1936RAID-1, also called 1937.Dq mirroring , 1938or RAID-5, also called 1939.Dq parity . 1940.Pp 1941To set up mirroring, create multiple plexes in a volume. For example, to create 1942a mirrored volume of 2 GB, you might create the following configuration file: 1943.Bd -literal -offset indent 1944drive d1 device /dev/da2e 1945drive d2 device /dev/da3e 1946volume mirror 1947 plex org concat 1948 sd length 2g drive d1 1949 plex org concat 1950 sd length 2g drive d2 1951.Ed 1952.Pp 1953When creating mirrored drives, it is important to ensure that the data from each 1954plex is on a different physical disk so that 1955.Nm 1956can access the complete address space of the volume even if a drive fails. 1957Note that each plex requires as much data as the complete volume: in this 1958example, the volume has a size of 2 GB, but each plex (and each subdisk) 1959requires 2 GB, so the total disk storage requirement is 4 GB. 1960.Pp 1961To set up RAID-5, create a single plex of type 1962.Cm raid5 . 1963For example, to create an equivalent resilient volume of 2 GB, you might use the 1964following configuration file: 1965.Bd -literal -offset indent 1966drive d1 device /dev/da2e 1967drive d2 device /dev/da3e 1968drive d3 device /dev/da4e 1969drive d4 device /dev/da5e 1970drive d5 device /dev/da6e 1971volume raid 1972 plex org raid5 512k 1973 sd length 512m drive d1 1974 sd length 512m drive d2 1975 sd length 512m drive d3 1976 sd length 512m drive d4 1977 sd length 512m drive d5 1978.Ed 1979.Pp 1980RAID-5 plexes require at least three subdisks, one of which is used for storing 1981parity information and is lost for data storage. The more disks you use, the 1982greater the proportion of the disk storage can be used for data storage. In 1983this example, the total storage usage is 2.5 GB, compared to 4 GB for a mirrored 1984configuration. If you were to use the minimum of only three disks, you would 1985require 3 GB to store the information, for example: 1986.Bd -literal -offset indent 1987drive d1 device /dev/da2e 1988drive d2 device /dev/da3e 1989drive d3 device /dev/da4e 1990volume raid 1991 plex org raid5 512k 1992 sd length 1g drive d1 1993 sd length 1g drive d2 1994 sd length 1g drive d3 1995.Ed 1996.Pp 1997As with creating mirrored drives, it is important to ensure that the data from 1998each subdisk is on a different physical disk so that 1999.Nm 2000can access the complete address space of the volume even if a drive fails. 2001.It 2002You want to set up 2003.Nm 2004to allow more concurrent access to a file system. In many cases, access to a 2005file system is limited by the speed of the disk. By spreading the volume across 2006multiple disks, you can increase the throughput in multi-access environments. 2007This technique shows little or no performance improvement in single-access 2008environments. 2009.Nm 2010uses a technique called 2011.Dq striping , 2012or sometimes RAID-0, to increase this concurrency of access. The name RAID-0 is 2013misleading: striping does not provide any redundancy or additional reliability. 2014In fact, it decreases the reliability, since the failure of a single disk will 2015render the volume useless, and the more disks you have, the more likely it is 2016that one of them will fail. 2017.Pp 2018To implement striping, use a 2019.Cm striped 2020plex: 2021.Bd -literal -offset indent 2022drive d1 device /dev/da2e 2023drive d2 device /dev/da3e 2024drive d3 device /dev/da4e 2025drive d4 device /dev/da5e 2026volume raid 2027 plex org striped 512k 2028 sd length 512m drive d1 2029 sd length 512m drive d2 2030 sd length 512m drive d3 2031 sd length 512m drive d4 2032.Ed 2033.Pp 2034A striped plex must have at least two subdisks, but the increase in performance 2035is greater if you have a larger number of disks. 2036.It 2037You may want to have the best of both worlds and have both resilience and 2038performance. This is sometimes called RAID-10 (a combination of RAID-1 and 2039RAID-0), though again this name is misleading. With 2040.Nm 2041you can do this with the following configuration file: 2042.Bd -literal -offset indent 2043drive d1 device /dev/da2e 2044drive d2 device /dev/da3e 2045drive d3 device /dev/da4e 2046drive d4 device /dev/da5e 2047volume raid setupstate 2048 plex org striped 512k 2049 sd length 512m drive d1 2050 sd length 512m drive d2 2051 sd length 512m drive d3 2052 sd length 512m drive d4 2053 plex org striped 512k 2054 sd length 512m drive d4 2055 sd length 512m drive d3 2056 sd length 512m drive d2 2057 sd length 512m drive d1 2058.Ed 2059.Pp 2060Here the plexes are striped, increasing performance, and there are two of them, 2061increasing reliability. Note that this example shows the subdisks of the second 2062plex in reverse order from the first plex. This is for performance reasons and 2063will be discussed below. In addition, the volume specification includes the 2064keyword 2065.Cm setupstate , 2066which ensures that all plexes are 2067.Em up 2068after creation. 2069.El 2070.Ss Creating the volumes 2071Once you have created your configuration files, start 2072.Nm 2073and create the volumes. In this example, the configuration is in the file 2074.Pa configfile : 2075.Bd -literal -offset 2n 2076# vinum create -v configfile 2077 1: drive d1 device /dev/da2e 2078 2: drive d2 device /dev/da3e 2079 3: volume mirror 2080 4: plex org concat 2081 5: sd length 2g drive d1 2082 6: plex org concat 2083 7: sd length 2g drive d2 2084Configuration summary 2085 2086Drives: 2 (4 configured) 2087Volumes: 1 (4 configured) 2088Plexes: 2 (8 configured) 2089Subdisks: 2 (16 configured) 2090 2091Drive d1: Device /dev/da2e 2092 Created on vinum.lemis.com at Tue Mar 23 12:30:31 1999 2093 Config last updated Tue Mar 23 14:30:32 1999 2094 Size: 60105216000 bytes (57320 MB) 2095 Used: 2147619328 bytes (2048 MB) 2096 Available: 57957596672 bytes (55272 MB) 2097 State: up 2098 Last error: none 2099Drive d2: Device /dev/da3e 2100 Created on vinum.lemis.com at Tue Mar 23 12:30:32 1999 2101 Config last updated Tue Mar 23 14:30:33 1999 2102 Size: 60105216000 bytes (57320 MB) 2103 Used: 2147619328 bytes (2048 MB) 2104 Available: 57957596672 bytes (55272 MB) 2105 State: up 2106 Last error: none 2107 2108Volume mirror: Size: 2147483648 bytes (2048 MB) 2109 State: up 2110 Flags: 2111 2 plexes 2112 Read policy: round robin 2113 2114Plex mirror.p0: Size: 2147483648 bytes (2048 MB) 2115 Subdisks: 1 2116 State: up 2117 Organization: concat 2118 Part of volume mirror 2119Plex mirror.p1: Size: 2147483648 bytes (2048 MB) 2120 Subdisks: 1 2121 State: up 2122 Organization: concat 2123 Part of volume mirror 2124 2125Subdisk mirror.p0.s0: 2126 Size: 2147483648 bytes (2048 MB) 2127 State: up 2128 Plex mirror.p0 at offset 0 2129 2130Subdisk mirror.p1.s0: 2131 Size: 2147483648 bytes (2048 MB) 2132 State: up 2133 Plex mirror.p1 at offset 0 2134.Ed 2135.Pp 2136The 2137.Fl v 2138option tells 2139.Nm 2140to list the file as it configures. Subsequently it lists the current 2141configuration in the same format as the 2142.Ic list Fl v 2143command. 2144.Ss Creating more volumes 2145Once you have created the 2146.Nm 2147volumes, 2148.Nm 2149keeps track of them in its internal configuration files. You do not need to 2150create them again. In particular, if you run the 2151.Ic create 2152command again, you will create additional objects: 2153.Bd -literal 2154# vinum create sampleconfig 2155Configuration summary 2156 2157Drives: 2 (4 configured) 2158Volumes: 1 (4 configured) 2159Plexes: 4 (8 configured) 2160Subdisks: 4 (16 configured) 2161 2162D d1 State: up Device /dev/da2e Avail: 53224/57320 MB (92%) 2163D d2 State: up Device /dev/da3e Avail: 53224/57320 MB (92%) 2164 2165V mirror State: up Plexes: 4 Size: 2048 MB 2166 2167P mirror.p0 C State: up Subdisks: 1 Size: 2048 MB 2168P mirror.p1 C State: up Subdisks: 1 Size: 2048 MB 2169P mirror.p2 C State: up Subdisks: 1 Size: 2048 MB 2170P mirror.p3 C State: up Subdisks: 1 Size: 2048 MB 2171 2172S mirror.p0.s0 State: up PO: 0 B Size: 2048 MB 2173S mirror.p1.s0 State: up PO: 0 B Size: 2048 MB 2174S mirror.p2.s0 State: up PO: 0 B Size: 2048 MB 2175S mirror.p3.s0 State: up PO: 0 B Size: 2048 MB 2176.Ed 2177.Pp 2178As this example (this time with the 2179.Fl f 2180option) shows, re-running the 2181.Ic create 2182has created four new plexes, each with a new subdisk. If you want to add other 2183volumes, create new configuration files for them. They do not need to reference 2184the drives that 2185.Nm 2186already knows about. For example, to create a volume 2187.Pa raid 2188on the four drives 2189.Pa /dev/da1e , /dev/da2e , /dev/da3e 2190and 2191.Pa /dev/da4e , 2192you only need to mention the other two: 2193.Bd -literal -offset indent 2194drive d3 device /dev/da1e 2195drive d4 device /dev/da4e 2196volume raid 2197 plex org raid5 512k 2198 sd size 2g drive d1 2199 sd size 2g drive d2 2200 sd size 2g drive d3 2201 sd size 2g drive d4 2202.Ed 2203.Pp 2204With this configuration file, we get: 2205.Bd -literal 2206# vinum create newconfig 2207Configuration summary 2208 2209Drives: 4 (4 configured) 2210Volumes: 2 (4 configured) 2211Plexes: 5 (8 configured) 2212Subdisks: 8 (16 configured) 2213 2214D d1 State: up Device /dev/da2e Avail: 51176/57320 MB (89%) 2215D d2 State: up Device /dev/da3e Avail: 53220/57320 MB (89%) 2216D d3 State: up Device /dev/da1e Avail: 53224/57320 MB (92%) 2217D d4 State: up Device /dev/da4e Avail: 53224/57320 MB (92%) 2218 2219V mirror State: down Plexes: 4 Size: 2048 MB 2220V raid State: down Plexes: 1 Size: 6144 MB 2221 2222P mirror.p0 C State: init Subdisks: 1 Size: 2048 MB 2223P mirror.p1 C State: init Subdisks: 1 Size: 2048 MB 2224P mirror.p2 C State: init Subdisks: 1 Size: 2048 MB 2225P mirror.p3 C State: init Subdisks: 1 Size: 2048 MB 2226P raid.p0 R5 State: init Subdisks: 4 Size: 6144 MB 2227 2228S mirror.p0.s0 State: up PO: 0 B Size: 2048 MB 2229S mirror.p1.s0 State: up PO: 0 B Size: 2048 MB 2230S mirror.p2.s0 State: up PO: 0 B Size: 2048 MB 2231S mirror.p3.s0 State: up PO: 0 B Size: 2048 MB 2232S raid.p0.s0 State: empty PO: 0 B Size: 2048 MB 2233S raid.p0.s1 State: empty PO: 512 kB Size: 2048 MB 2234S raid.p0.s2 State: empty PO: 1024 kB Size: 2048 MB 2235S raid.p0.s3 State: empty PO: 1536 kB Size: 2048 MB 2236.Ed 2237.Pp 2238Note the size of the RAID-5 plex: it is only 6 GB, although together its 2239components use 8 GB of disk space. This is because the equivalent of one 2240subdisk is used for storing parity data. 2241.Ss Restarting Vinum 2242On rebooting the system, start 2243.Nm 2244with the 2245.Ic start 2246command: 2247.Pp 2248.Dl "# vinum start" 2249.Pp 2250This will start all the 2251.Nm 2252drives in the system. If for some reason you wish to start only some of them, 2253use the 2254.Ic read 2255command. 2256.Ss Performance considerations 2257A number of misconceptions exist about how to set up a RAID array for best 2258performance. In particular, most systems use far too small a stripe size. The 2259following discussion applies to all RAID systems, not just to 2260.Nm . 2261.Pp 2262The 2263.Fx 2264block I/O system issues requests of between .5kB and 128 kB; a 2265typical mix is somewhere round 8 kB. You can't stop any striping system from 2266breaking a request into two physical requests, and if you make the stripe small 2267enough, it can be broken into several. This will result in a significant drop 2268in performance: the decrease in transfer time per disk is offset by the order of 2269magnitude greater increase in latency. 2270.Pp 2271With modern disk sizes and the 2272.Fx 2273I/O system, you can expect to have a 2274reasonably small number of fragmented requests with a stripe size between 256 kB 2275and 512 kB; with correct RAID implementations there is no obvious reason not to 2276increase the size to 2 or 4 MB on a large disk. 2277.Pp 2278When choosing a stripe size, consider that most current UFS file systems have 2279cylinder groups 32 MB in size. If you have a stripe size and number of disks 2280both of which are a power of two, it is probable that all superblocks and inodes 2281will be placed on the same subdisk, which will impact performance significantly. 2282Choose an odd number instead, for example 479 kB. 2283.Pp 2284The easiest way to consider the impact of any transfer in a multi-access system 2285is to look at it from the point of view of the potential bottleneck, the disk 2286subsystem: how much total disk time does the transfer use? 2287Since just about 2288everything is cached, the time relationship between the request and its 2289completion is not so important: the important parameter is the total time that 2290the request keeps the disks active, the time when the disks are not available to 2291perform other transfers. As a result, it doesn't really matter if the transfers 2292are happening at the same time or different times. In practical terms, the time 2293we're looking at is the sum of the total latency (positioning time and 2294rotational latency, or the time it takes for the data to arrive under the disk 2295heads) and the total transfer time. For a given transfer to disks of the same 2296speed, the transfer time depends only on the total size of the transfer. 2297.Pp 2298Consider a typical news article or web page of 24 kB, which will probably be 2299read in a single I/O. Take disks with a transfer rate of 6 MB/s and an average 2300positioning time of 8 ms, and a file system with 4 kB blocks. Since it's 24 kB, 2301we don't have to worry about fragments, so the file will start on a 4 kB 2302boundary. The number of transfers required depends on where the block starts: 2303it's (S + F - 1) / S, where S is the stripe size in file system blocks, and F is 2304the file size in file system blocks. 2305.Bl -enum 2306.It 2307Stripe size of 4 kB. You'll have 6 transfers. Total subsystem load: 48 ms 2308latency, 2 ms transfer, 50 ms total. 2309.It 2310Stripe size of 8 kB. On average, you'll have 3.5 transfers. Total subsystem 2311load: 28 ms latency, 2 ms transfer, 30 ms total. 2312.It 2313Stripe size of 16 kB. On average, you'll have 2.25 transfers. Total subsystem 2314load: 18 ms latency, 2 ms transfer, 20 ms total. 2315.It 2316Stripe size of 256 kB. On average, you'll have 1.08 transfers. Total subsystem 2317load: 8.6 ms latency, 2 ms transfer, 10.6 ms total. 2318.It 2319Stripe size of 4 MB. On average, you'll have 1.0009 transfers. Total subsystem 2320load: 8.01 ms latency, 2 ms transfer, 10.01 ms total. 2321.El 2322.Pp 2323It appears that some hardware RAID systems have problems with large stripes: 2324they appear to always transfer a complete stripe to or from disk, so that a 2325large stripe size will have an adverse effect on performance. 2326.Nm 2327does not suffer from this problem: it optimizes all disk transfers and does not 2328transfer unneeded data. 2329.Pp 2330Note that no well-known benchmark program tests true multi-access conditions 2331(more than 100 concurrent users), so it is difficult to demonstrate the validity 2332of these statements. 2333.Pp 2334Given these considerations, the following factors affect the performance of a 2335.Nm 2336volume: 2337.Bl -bullet 2338.It 2339Striping improves performance for multiple access only, since it increases the 2340chance of individual requests being on different drives. 2341.It 2342Concatenating UFS file systems across multiple drives can also improve 2343performance for multiple file access, since UFS divides a file system into 2344cylinder groups and attempts to keep files in a single cylinder group. In 2345general, it is not as effective as striping. 2346.It 2347Mirroring can improve multi-access performance for reads, since by default 2348.Nm 2349issues consecutive reads to consecutive plexes. 2350.It 2351Mirroring decreases performance for all writes, whether multi-access or single 2352access, since the data must be written to both plexes. This explains the 2353subdisk layout in the example of a mirroring configuration above: if the 2354corresponding subdisk in each plex is on a different physical disk, the write 2355commands can be issued in parallel, whereas if they are on the same physical 2356disk, they will be performed sequentially. 2357.It 2358RAID-5 reads have essentially the same considerations as striped reads, unless 2359the striped plex is part of a mirrored volume, in which case the performance of 2360the mirrored volume will be better. 2361.It 2362RAID-5 writes are approximately 25% of the speed of striped writes: to perform 2363the write, 2364.Nm 2365must first read the data block and the corresponding parity block, perform some 2366calculations and write back the parity block and the data block, four times as 2367many transfers as for writing a striped plex. On the other hand, this is offset 2368by the cost of mirroring, so writes to a volume with a single RAID-5 plex are 2369approximately half the speed of writes to a correctly configured volume with two 2370striped plexes. 2371.It 2372When the 2373.Nm 2374configuration changes (for example, adding or removing objects, or the change of 2375state of one of the objects), 2376.Nm 2377writes up to 128 kB of updated configuration to each drive. The larger the 2378number of drives, the longer this takes. 2379.El 2380.Ss Creating file systems on Vinum volumes 2381You do not need to run 2382.Xr disklabel 8 2383before creating a file system on a 2384.Nm 2385volume. Just run 2386.Xr newfs 8 . 2387Use the 2388.Fl v 2389option to state that the device is not divided into partitions. For example, to 2390create a file system on volume 2391.Pa mirror , 2392enter the following command: 2393.Pp 2394.Dl "# newfs -v /dev/vinum/mirror" 2395.Pp 2396A number of other considerations apply to 2397.Nm 2398configuration: 2399.Bl -bullet 2400.It 2401There is no advantage in creating multiple drives on a single disk. Each drive 2402uses 131.5 kB of data for label and configuration information, and performance 2403will suffer when the configuration changes. Use appropriately sized subdisks instead. 2404.It 2405It is possible to increase the size of a concatenated 2406.Nm 2407plex, but currently the size of striped and RAID-5 plexes cannot be increased. 2408Currently the size of an existing UFS file system also cannot be increased, but 2409it is planned to make both plexes and file systems extensible. 2410.El 2411.Sh STATE MANAGEMENT 2412Vinum objects have the concept of 2413.Em state . 2414See 2415.Xr vinum 4 2416for more details. They are only completely accessible if their state is 2417.Em up . 2418To change an object state to 2419.Em up , 2420use the 2421.Ic start 2422command. To change an object state to 2423.Em down , 2424use the 2425.Ic stop 2426command. Normally other states are created automatically by the relationship 2427between objects. For example, if you add a plex to a volume, the subdisks of 2428the plex will be set in the 2429.Em empty 2430state, indicating that, though the hardware is accessible, the data on the 2431subdisk is invalid. As a result of this state, the plex will be set in the 2432.Em faulty 2433state. 2434.Ss The `reviving' state 2435In many cases, when you start a subdisk the system must copy data to the 2436subdisk. Depending on the size of the subdisk, this can take a long time. 2437During this time, the subdisk is set in the 2438.Em reviving 2439state. On successful completion of the copy operation, it is automatically set 2440to the 2441.Em up 2442state. It is possible for the process performing the revive to be stopped and 2443restarted. The system keeps track of how far the subdisk has been revived, and 2444when the 2445.Ic start 2446command is reissued, the copying continues from this point. 2447.Pp 2448In order to maintain the consistency of a volume while one or more of its plexes 2449is being revived, 2450.Nm 2451writes to subdisks which have been revived up to the point of the write. It may 2452also read from the plex if the area being read has already been revived. 2453.Sh GOTCHAS 2454The following points are not bugs, and they have good reasons for existing, but 2455they have shown to cause confusion. Each is discussed in the appropriate 2456section above. 2457.Bl -enum 2458.It 2459.Nm 2460drives are 2461.Ux 2462disk partitions and must have the partition type 2463.Em vinum . 2464This is different from 2465.Xr ccd 4 , 2466which expects partitions of type 2467.Em 4.2BSD . 2468This behaviour of 2469.Nm ccd 2470is an invitation to shoot yourself in the foot: with 2471.Nm ccd 2472you can easily overwrite a file system. 2473.Nm 2474will not permit this. 2475.Pp 2476For similar reasons, the 2477.Nm Ic start 2478command will not accept a drive on partition 2479.Dq Li c . 2480Partition 2481.Dq Li c 2482is used by the system to represent the whole disk, and must be of type 2483.Em unused . 2484Clearly there is a conflict here, which 2485.Nm 2486resolves by not using the 2487.Dq Li c 2488partition. 2489.It 2490When you create a volume with multiple plexes, 2491.Nm 2492does not automatically initialize the plexes. This means that the contents are 2493not known, but they are certainly not consistent. As a result, by default 2494.Nm 2495sets the state of all newly-created plexes except the first to 2496.Em faulty . 2497In order to synchronize them with the first plex, you must 2498.Ic start 2499them, which causes 2500.Nm 2501to copy the data from a plex which is in the 2502.Em up 2503state. Depending on the size of the subdisks involved, this can take a long 2504time. 2505.Pp 2506In practice, people aren't too interested in what was in the plex when it was 2507created, and other volume managers cheat by setting them 2508.Em up 2509anyway. 2510.Nm 2511provides two ways to ensure that newly created plexes are 2512.Em up : 2513.Bl -bullet 2514.It 2515Create the plexes and then synchronize them with 2516.Nm Ic start . 2517.It 2518Create the volume (not the plex) with the keyword 2519.Cm setupstate , 2520which tells 2521.Nm 2522to ignore any possible inconsistency and set the plexes to be 2523.Em up . 2524.El 2525.It 2526Some of the commands currently supported by 2527.Nm 2528are not really needed. For reasons which I don't understand, however, I find 2529that users frequently try the 2530.Ic label 2531and 2532.Ic resetconfig 2533commands, though especially 2534.Ic resetconfig 2535outputs all sort of dire warnings. Don't use these commands unless you have a 2536good reason to do so. 2537.It 2538Some state transitions are not very intuitive. In fact, it's not clear whether 2539this is a bug or a feature. If you find that you can't start an object in some 2540strange state, such as a 2541.Em reborn 2542subdisk, try first to get it into 2543.Em stopped 2544state, with the 2545.Ic stop 2546or 2547.Ic stop Fl f 2548commands. If that works, you should then be able to start it. If you find 2549that this is the only way to get out of a position where easier methods fail, 2550please report the situation. 2551.It 2552If you build the kernel module with the 2553.Fl D Ns Dv VINUMDEBUG 2554option, you must also build 2555.Nm 2556with the 2557.Fl D Ns Dv VINUMDEBUG 2558option, since the size of some data objects used by both components depends on 2559this option. If you don't do so, commands will fail with the message 2560.Sy Invalid argument , 2561and a console message will be logged such as 2562.Bl -diag 2563.It "vinumioctl: invalid ioctl from process 247 (vinum): c0e44642" 2564.El 2565.Pp 2566This error may also occur if you use old versions of KLD or userland program. 2567.It 2568The 2569.Nm Ic read 2570command has a particularly emetic syntax. Once it was the only way to start 2571.Nm , 2572but now the preferred method is with 2573.Nm Ic start . 2574.Nm Ic read 2575should be used for maintenance purposes only. Note that its syntax has changed, 2576and the arguments must be disk slices, such as 2577.Pa /dev/da0 , 2578not partitions such as 2579.Pa /dev/da0e . 2580.El 2581.\"XXX.Sh BUGS 2582.Sh FILES 2583.Bl -tag -width /dev/vinum/control -compact 2584.It Pa /dev/vinum 2585directory with device nodes for 2586.Nm 2587objects 2588.It Pa /dev/vinum/control 2589control device for 2590.Nm 2591.It Pa /dev/vinum/plex 2592directory containing device nodes for 2593.Nm 2594plexes 2595.It Pa /dev/vinum/sd 2596directory containing device nodes for 2597.Nm 2598subdisks 2599.El 2600.Sh ENVIRONMENT 2601.Bl -tag -width VINUM_DATEFORMAT 2602.It Ev VINUM_HISTORY 2603The name of the log file, by default 2604.Pa /var/log/vinum_history . 2605.It Ev VINUM_DATEFORMAT 2606The format of dates in the log file, by default 2607.Qq Li %e %b %Y %H:%M:%S . 2608.It Ev EDITOR 2609The name of the editor to use for editing configuration files, by default 2610.Nm vi . 2611.El 2612.Sh SEE ALSO 2613.Xr strftime 3 , 2614.Xr vinum 4 , 2615.Xr disklabel 8 , 2616.Xr newfs 8 2617.Pp 2618.Pa http://www.vinumvm.org/vinum/ , 2619.Pa http://www.vinumvm.org/vinum/how-to-debug.html . 2620.Sh AUTHORS 2621.An Greg Lehey Aq grog@lemis.com 2622.Sh HISTORY 2623The 2624.Nm 2625command first appeared in 2626.Fx 3.0 . 2627The RAID-5 component of 2628.Nm 2629was developed for Cybernet Inc.\& 2630.Pq Pa www.cybernet.com 2631for its NetMAX product. 2632