1.\" Hey, Emacs, edit this file in -*- nroff-fill -*- mode 2.\"- 3.\" Copyright (c) 1997, 1998 4.\" Nan Yang Computer Services Limited. All rights reserved. 5.\" 6.\" This software is distributed under the so-called ``Berkeley 7.\" License'': 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 3. All advertising materials mentioning features or use of this software 18.\" must display the following acknowledgement: 19.\" This product includes software developed by Nan Yang Computer 20.\" Services Limited. 21.\" 4. Neither the name of the Company nor the names of its contributors 22.\" may be used to endorse or promote products derived from this software 23.\" without specific prior written permission. 24.\" 25.\" This software is provided ``as is'', and any express or implied 26.\" warranties, including, but not limited to, the implied warranties of 27.\" merchantability and fitness for a particular purpose are disclaimed. 28.\" In no event shall the company or contributors be liable for any 29.\" direct, indirect, incidental, special, exemplary, or consequential 30.\" damages (including, but not limited to, procurement of substitute 31.\" goods or services; loss of use, data, or profits; or business 32.\" interruption) however caused and on any theory of liability, whether 33.\" in contract, strict liability, or tort (including negligence or 34.\" otherwise) arising in any way out of the use of this software, even if 35.\" advised of the possibility of such damage. 36.\" 37.\" $Id: vinum.8,v 1.48 2001/01/15 22:15:05 grog Exp $ 38.\" $FreeBSD: src/sbin/vinum/vinum.8,v 1.33.2.10 2002/12/29 16:35:38 schweikh Exp $ 39.\" $DragonFly: src/sbin/vinum/vinum.8,v 1.3 2004/03/11 12:28:55 hmp Exp $ 40.\" 41.Dd December 20, 2000 42.Dt VINUM 8 43.Os 44.Sh NAME 45.Nm vinum 46.Nd Logical Volume Manager control program 47.Sh SYNOPSIS 48.Nm 49.Op Ar command 50.Op Fl options 51.Sh COMMANDS 52.Bl -tag -width indent 53.It Ic attach Ar plex volume Op Cm rename 54.It Xo 55.Ic attach Ar subdisk plex 56.Op Ar offset 57.Op Cm rename 58.Xc 59Attach a plex to a volume, or a subdisk to a plex. 60.It Xo 61.Ic checkparity Ar plex 62.Op Fl f 63.Op Fl v 64.Xc 65Check the parity blocks of a RAID-4 or RAID-5 plex. 66.It Xo 67.Ic concat 68.Op Fl f 69.Op Fl n Ar name 70.Op Fl v 71.Ar drives 72.Xc 73Create a concatenated volume from the specified drives. 74.It Xo 75.Ic create 76.Op Fl f 77.Ar description-file 78.Xc 79Create a volume as described in 80.Ar description-file . 81.It Ic debug 82Cause the volume manager to enter the kernel debugger. 83.It Ic debug Ar flags 84Set debugging flags. 85.It Xo 86.Ic detach 87.Op Fl f 88.Op Ar plex | subdisk 89.Xc 90Detach a plex or subdisk from the volume or plex to which it is attached. 91.It Ic dumpconfig Op Ar drive ... 92List the configuration information stored on the specified drives, or all drives 93in the system if no drive names are specified. 94.It Xo 95.Ic info 96.Op Fl v 97.Op Fl V 98.Xc 99List information about volume manager state. 100.It Xo 101.Ic init 102.Op Fl S Ar size 103.Op Fl w 104.Ar plex | subdisk 105.Xc 106.\" XXX 107Initialize the contents of a subdisk or all the subdisks of a plex to all zeros. 108.It Ic label Ar volume 109Create a volume label. 110.It Xo 111.Ic l | list 112.Op Fl r 113.Op Fl s 114.Op Fl v 115.Op Fl V 116.Op Ar volume | plex | subdisk 117.Xc 118List information about specified objects. 119.It Xo 120.Ic ld 121.Op Fl r 122.Op Fl s 123.Op Fl v 124.Op Fl V 125.Op Ar volume 126.Xc 127List information about drives. 128.It Xo 129.Ic ls 130.Op Fl r 131.Op Fl s 132.Op Fl v 133.Op Fl V 134.Op Ar subdisk 135.Xc 136List information about subdisks. 137.It Xo 138.Ic lp 139.Op Fl r 140.Op Fl s 141.Op Fl v 142.Op Fl V 143.Op Ar plex 144.Xc 145List information about plexes. 146.It Xo 147.Ic lv 148.Op Fl r 149.Op Fl s 150.Op Fl v 151.Op Fl V 152.Op Ar volume 153.Xc 154List information about volumes. 155.It Ic makedev 156Remake the device nodes in 157.Pa /dev/vinum . 158.It Xo 159.Ic mirror 160.Op Fl f 161.Op Fl n Ar name 162.Op Fl s 163.Op Fl v 164.Ar drives 165.Xc 166Create a mirrored volume from the specified drives. 167.It Xo 168.Ic move | mv 169.Fl f 170.Ar drive object ... 171.Xc 172Move the object(s) to the specified drive. 173.It Ic printconfig Op Ar file 174Write a copy of the current configuration to 175.Ar file . 176.It Ic quit 177Exit the 178.Nm 179program when running in interactive mode. Normally this would be done by 180entering the 181.Dv EOF 182character. 183.It Ic read Ar disk ... 184Read the 185.Nm 186configuration from the specified disks. 187.It Xo 188.Ic rename Op Fl r 189.Op Ar drive | subdisk | plex | volume 190.Ar newname 191.Xc 192Change the name of the specified object. 193.\" XXX 194.\".It Ic replace Ar drive newdrive 195.\"Move all the subdisks from the specified drive onto the new drive. 196.It Xo 197.Ic rebuildparity Ar plex Op Fl f 198.Op Fl v 199.Op Fl V 200.Xc 201Rebuild the parity blocks of a RAID-4 or RAID-5 plex. 202.It Ic resetconfig 203Reset the complete 204.Nm 205configuration. 206.It Xo 207.Ic resetstats 208.Op Fl r 209.Op Ar volume | plex | subdisk 210.Xc 211Reset statistics counters for the specified objects, or for all objects if none 212are specified. 213.It Xo 214.Ic rm 215.Op Fl f 216.Op Fl r 217.Ar volume | plex | subdisk 218.Xc 219Remove an object. 220.It Ic saveconfig 221Save 222.Nm 223configuration to disk after configuration failures. 224.\" XXX 225.\".It Xo 226.\".Ic set 227.\".Op Fl f 228.\".Ar state 229.\".Ar volume | plex | subdisk | disk 230.\".Xc 231.\"Set the state of the object to 232.\".Ar state . 233.It Ic setdaemon Op Ar value 234Set daemon configuration. 235.It Xo 236.Ic setstate 237.Ar state 238.Op Ar volume | plex | subdisk | drive 239.Xc 240Set state without influencing other objects, for diagnostic purposes only. 241.It Ic start 242Read configuration from all vinum drives. 243.It Xo 244.Ic start 245.Op Fl i Ar interval 246.Op Fl S Ar size 247.Op Fl w 248.Ar volume | plex | subdisk 249.Xc 250Allow the system to access the objects. 251.It Xo 252.Ic stop 253.Op Fl f 254.Op Ar volume | plex | subdisk 255.Xc 256Terminate access to the objects, or stop 257.Nm 258if no parameters are specified. 259.It Xo 260.Ic stripe 261.Op Fl f 262.Op Fl n Ar name 263.Op Fl v 264.Ar drives 265.Xc 266Create a striped volume from the specified drives. 267.El 268.Sh DESCRIPTION 269.Nm 270is a utility program to communicate with the 271.Xr vinum 4 272logical volume 273manager. 274.Nm 275is designed either for interactive use, when started without command line 276arguments, or to execute a single command if the command is supplied on the 277command line. In interactive mode, 278.Nm 279maintains a command line history. 280.Sh OPTIONS 281.Nm 282commands may optionally be followed by an option. Any of the following options 283may be specified with any command, but in some cases the options are ignored. 284For example, the 285.Ic stop 286command ignores the 287.Fl v 288and 289.Fl V 290options. 291.Bl -tag -width indent 292.It Fl f 293The 294.Fl f 295.Pq Dq force 296option overrides safety checks. Use with extreme care. This option is for 297emergency use only. For example, the command 298.Pp 299.Dl rm -f myvolume 300.Pp 301removes 302.Ar myvolume 303even if it is open. Any subsequent access to the volume will almost certainly 304cause a panic. 305.It Fl i Ar millisecs 306When performing the 307.Ic init 308and 309.Ic start 310commands, wait 311.Ar millisecs 312milliseconds between copying each block. This lowers the load on the system. 313.It Fl n Ar name 314Use the 315.Fl n 316option to specify a volume name to the simplified configuration commands 317.Ic concat , mirror 318and 319.Ic stripe . 320.It Fl r 321The 322.Fl r 323.Pq Dq recursive 324option is used by the list commands to display information not 325only about the specified objects, but also about subordinate objects. For 326example, in conjunction with the 327.Ic lv 328command, the 329.Fl r 330option will also show information about the plexes and subdisks belonging to the 331volume. 332.It Fl s 333The 334.Fl s 335.Pq Dq statistics 336option is used by the list commands to display statistical information. The 337.Ic mirror 338command also uses this option to specify that it should create striped plexes. 339.It Fl S Ar size 340The 341.Fl S 342option specifies the transfer size for the 343.Ic init 344and 345.Ic start 346commands. 347.It Fl v 348The 349.Fl v 350.Pq Dq verbose 351option can be used to request more detailed information. 352.It Fl V 353The 354.Fl V 355.Pq Dq Very verbose 356option can be used to request more detailed information than the 357.Fl v 358option provides. 359.It Fl w 360The 361.Fl w 362.Pq Dq wait 363option tells 364.Nm 365to wait for completion of commands which normally run in the background, such as 366.Ic init . 367.El 368.Sh COMMANDS IN DETAIL 369.Nm 370commands perform the following functions: 371.Pp 372.Bl -tag -width indent -compact 373.It Ic attach Ar plex volume Op Cm rename 374.It Xo 375.Ic attach Ar subdisk plex 376.Op Ar offset 377.Op Cm rename 378.Xc 379.Nm Ic attach 380inserts the specified plex or subdisk in a volume or plex. In the case of a 381subdisk, an offset in the plex may be specified. If it is not, the subdisk will 382be attached at the first possible location. After attaching a plex to a 383non-empty volume, 384.Nm 385reintegrates the plex. 386.Pp 387If the keyword 388.Cm rename 389is specified, 390.Nm 391renames the object (and in the case of a plex, any subordinate subdisks) to fit 392in with the default 393.Nm 394naming convention. To rename the object to any other name, use the 395.Ic rename 396command. 397.Pp 398A number of considerations apply to attaching subdisks: 399.Bl -bullet 400.It 401Subdisks can normally only be attached to concatenated plexes. 402.It 403If a striped or RAID-5 plex is missing a subdisk (for example after drive 404failure), it should be replaced by a subdisk of the same size only. 405.It 406In order to add further subdisks to a striped or RAID-5 plex, use the 407.Fl f 408(force) option. This will corrupt the data in the plex. 409.\"No other attachment of 410.\"subdisks is currently allowed for striped and RAID-5 plexes. 411.It 412For concatenated plexes, the 413.Ar offset 414parameter specifies the offset in blocks from the beginning of the plex. For 415striped and RAID-5 plexes, it specifies the offset of the first block of the 416subdisk: in other words, the offset is the numerical position of the subdisk 417multiplied by the stripe size. For example, in a plex with stripe size 271k, 418the first subdisk will have offset 0, the second offset 271k, the third 542k, 419etc. This calculation ignores parity blocks in RAID-5 plexes. 420.El 421.Pp 422.It Xo 423.Ic checkparity 424.Ar plex 425.Op Fl f 426.Op Fl v 427.Xc 428Check the parity blocks on the specified RAID-4 or RAID-5 plex. This operation 429maintains a pointer in the plex, so it can be stopped and later restarted from 430the same position if desired. In addition, this pointer is used by the 431.Ic rebuildparity 432command, so rebuilding the parity blocks need only start at the location where 433the first parity problem has been detected. 434.Pp 435If the 436.Fl f 437flag is specified, 438.Ic checkparity 439starts checking at the beginning of the plex. If the 440.Fl v 441flag is specified, 442.Ic checkparity 443prints a running progress report. 444.Pp 445.It Xo 446.Ic concat 447.Op Fl f 448.Op Fl n Ar name 449.Op Fl v 450.Ar drives 451.Xc 452The 453.Ic concat 454command provides a simplified alternative to the 455.Ic create 456command for creating volumes with a single concatenated plex. The largest 457contiguous space available on each drive is used to create the subdisks for the 458plexes. 459.Pp 460Normally, the 461.Ic concat 462command creates an arbitrary name for the volume and its components. The name 463is composed of the text 464.Dq Li vinum 465and a small integer, for example 466.Dq Li vinum3 . 467You can override this with the 468.Fl n Ar name 469option, which assigns the name specified to the volume. The plexes and subdisks 470are named after the volume in the default manner. 471.Pp 472There is no choice of name for the drives. If the drives have already been 473initialized as 474.Nm 475drives, the name remains. Otherwise the drives are given names starting with 476the text 477.Dq Li vinumdrive 478and a small integer, for example 479.Dq Li vinumdrive7 . 480As with the 481.Ic create 482command, the 483.Fl f 484option can be used to specify that a previous name should be overwritten. The 485.Fl v 486is used to specify verbose output. 487.Pp 488See the section 489.Sx SIMPLIFIED CONFIGURATION 490below for some examples of this 491command. 492.Pp 493.It Xo 494.Ic create 495.Op Fl f 496.Ar description-file 497.Xc 498.Nm Ic create 499is used to create any object. In view of the relatively complicated 500relationship and the potential dangers involved in creating a 501.Nm 502object, there is no interactive interface to this function. If you do not 503specify a file name, 504.Nm 505starts an editor on a temporary file. If the environment variable 506.Ev EDITOR 507is set, 508.Nm 509starts this editor. If not, it defaults to 510.Nm vi . 511See the section 512.Sx CONFIGURATION FILE 513below for more information on the format of 514this file. 515.Pp 516Note that the 517.Nm Ic create 518function is additive: if you run it multiple times, you will create multiple 519copies of all unnamed objects. 520.Pp 521Normally the 522.Ic create 523command will not change the names of existing 524.Nm 525drives, in order to avoid accidentally erasing them. The correct way to dispose 526of no longer wanted 527.Nm 528drives is to reset the configuration with the 529.Ic resetconfig 530command. In some cases, however, it may be necessary to create new data on 531.Nm 532drives which can no longer be started. In this case, use the 533.Ic create Fl f 534command. 535.Pp 536.It Ic debug 537.Nm Ic debug , 538without any arguments, is used to enter the remote kernel debugger. It is only 539activated if 540.Nm 541is built with the 542.Dv VINUMDEBUG 543option. This option will stop the execution of the operating system until the 544kernel debugger is exited. If remote debugging is set and there is no remote 545connection for a kernel debugger, it will be necessary to reset the system and 546reboot in order to leave the debugger. 547.Pp 548.It Ic debug Ar flags 549Set a bit mask of internal debugging flags. These will change without warning 550as the product matures; to be certain, read the header file 551.Aq Pa sys/dev/vinumvar.h . 552The bit mask is composed of the following values: 553.Bl -tag -width indent 554.It Dv DEBUG_ADDRESSES Pq No 1 555Show buffer information during requests 556.\".It Dv DEBUG_NUMOUTPUT Pq No 2 557.\"Show the value of 558.\".Va vp->v_numoutput . 559.It Dv DEBUG_RESID Pq No 4 560Go into debugger in 561.Fn complete_rqe . 562.It Dv DEBUG_LASTREQS Pq No 8 563Keep a circular buffer of last requests. 564.It Dv DEBUG_REVIVECONFLICT Pq No 16 565Print info about revive conflicts. 566.It Dv DEBUG_EOFINFO Pq No 32 567Print information about internal state when returning an 568.Dv EOF 569on a striped plex. 570.It Dv DEBUG_MEMFREE Pq No 64 571Maintain a circular list of the last memory areas freed by the memory allocator. 572.It Dv DEBUG_REMOTEGDB Pq No 256 573Go into remote 574.Nm gdb 575when the 576.Ic debug 577command is issued. 578.It Dv DEBUG_WARNINGS Pq No 512 579Print some warnings about minor problems in the implementation. 580.El 581.Pp 582.It Ic detach Oo Fl f Oc Ar plex 583.It Ic detach Oo Fl f Oc Ar subdisk 584.Nm Ic detach 585removes the specified plex or subdisk from the volume or plex to which it is 586attached. If removing the object would impair the data integrity of the volume, 587the operation will fail unless the 588.Fl f 589option is specified. If the object is named after the object above it (for 590example, subdisk 591.Li vol1.p7.s0 592attached to plex 593.Li vol1.p7 ) , 594the name will be changed 595by prepending the text 596.Dq Li ex- 597(for example, 598.Li ex-vol1.p7.s0 ) . 599If necessary, the name will be truncated in the 600process. 601.Pp 602.Ic detach 603does not reduce the number of subdisks in a striped or RAID-5 plex. Instead, 604the subdisk is marked absent, and can later be replaced with the 605.Ic attach 606command. 607.Pp 608.It Ic dumpconfig Op Ar drive ... 609.Pp 610.Nm Ic dumpconfig 611shows the configuration information stored on the specified drives. If no drive 612names are specified, 613.Ic dumpconfig 614searches all drives on the system for Vinum partitions and dumps the 615information. If configuration updates are disabled, it is possible that this 616information is not the same as the information returned by the 617.Ic list 618command. This command is used primarily for maintenance and debugging. 619.Pp 620.It Ic info 621.Nm Ic info 622displays information about 623.Nm 624memory usage. This is intended primarily for debugging. With the 625.Fl v 626option, it will give detailed information about the memory areas in use. 627.Pp 628With the 629.Fl V 630option, 631.Ic info 632displays information about the last up to 64 I/O requests handled by the 633.Nm 634driver. This information is only collected if debug flag 8 is set. The format 635looks like: 636.Bd -literal 637vinum -> info -V 638Flags: 0x200 1 opens 639Total of 38 blocks malloced, total memory: 16460 640Maximum allocs: 56, malloc table at 0xf0f72dbc 641 642Time Event Buf Dev Offset Bytes SD SDoff Doffset Goffset 643 64414:40:00.637758 1VS Write 0xf2361f40 91.3 0x10 16384 64514:40:00.639280 2LR Write 0xf2361f40 91.3 0x10 16384 64614:40:00.639294 3RQ Read 0xf2361f40 4.39 0x104109 8192 19 0 0 0 64714:40:00.639455 3RQ Read 0xf2361f40 4.23 0xd2109 8192 17 0 0 0 64814:40:00.639529 3RQ Read 0xf2361f40 4.15 0x6e109 8192 16 0 0 0 64914:40:00.652978 4DN Read 0xf2361f40 4.39 0x104109 8192 19 0 0 0 65014:40:00.667040 4DN Read 0xf2361f40 4.15 0x6e109 8192 16 0 0 0 65114:40:00.668556 4DN Read 0xf2361f40 4.23 0xd2109 8192 17 0 0 0 65214:40:00.669777 6RP Write 0xf2361f40 4.39 0x104109 8192 19 0 0 0 65314:40:00.685547 4DN Write 0xf2361f40 4.39 0x104109 8192 19 0 0 0 65411:11:14.975184 Lock 0xc2374210 2 0x1f8001 65511:11:15.018400 7VS Write 0xc2374210 0x7c0 32768 10 65611:11:15.018456 8LR Write 0xc2374210 13.39 0xcc0c9 32768 65711:11:15.046229 Unlock 0xc2374210 2 0x1f8001 658.Ed 659.Pp 660The 661.Ar Buf 662field always contains the address of the user buffer header. This can be used 663to identify the requests associated with a user request, though this is not 100% 664reliable: theoretically two requests in sequence could use the same buffer 665header, though this is not common. The beginning of a request can be identified 666by the event 667.Ar 1VS 668or 669.Ar 7VS . 670The first example above shows the requests involved in a user request. The 671second is a subdisk I/O request with locking. 672.Pp 673The 674.Ar Event 675field contains information related to the sequence of events in the request 676chain. The digit 677.Ar 1 678to 679.Ar 6 680indicates the approximate sequence of events, and the two-letter abbreviation is 681a mnemonic for the location: 682.Bl -tag -width Lockwait 683.It 1VS 684(vinumstrategy) shows information about the user request on entry to 685.Fn vinumstrategy . 686The device number is the 687.Nm 688device, and offset and length are the user parameters. This is always the 689beginning of a request sequence. 690.It 2LR 691(launch_requests) shows the user request just prior to launching the low-level 692.Nm 693requests in the function 694.Fn launch_requests . 695The parameters should be the same as in the 696.Ar 1VS 697information. 698.El 699.Pp 700In the following requests, 701.Ar Dev 702is the device number of the associated disk partition, 703.Ar Offset 704is the offset from the beginning of the partition, 705.Ar SD 706is the subdisk index in 707.Va vinum_conf , 708.Ar SDoff 709is the offset from the beginning of the subdisk, 710.Ar Doffset 711is the offset of the associated data request, and 712.Ar Goffset 713is the offset of the associated group request, where applicable. 714.Bl -tag -width Lockwait 715.It 3RQ 716(request) shows one of possibly several low-level 717.Nm 718requests which are launched to satisfy the high-level request. This information 719is also logged in 720.Fn launch_requests . 721.It 4DN 722(done) is called from 723.Fn complete_rqe , 724showing the completion of a request. This completion should match a request 725launched either at stage 726.Ar 4DN 727from 728.Fn launch_requests , 729or from 730.Fn complete_raid5_write 731at stage 732.Ar 5RD 733or 734.Ar 6RP . 735.It 5RD 736(RAID-5 data) is called from 737.Fn complete_raid5_write 738and represents the data written to a RAID-5 data stripe after calculating 739parity. 740.It 6RP 741(RAID-5 parity) is called from 742.Fn complete_raid5_write 743and represents the data written to a RAID-5 parity stripe after calculating 744parity. 745.It 7VS 746shows a subdisk I/O request. These requests are usually internal to 747.Nm 748for operations like initialization or rebuilding plexes. 749.It 8LR 750shows the low-level operation generated for a subdisk I/O request. 751.It Lockwait 752specifies that the process is waiting for a range lock. The parameters are the 753buffer header associated with the request, the plex number and the block number. 754For internal reasons the block number is one higher than the address of the 755beginning of the stripe. 756.It Lock 757specifies that a range lock has been obtained. The parameters are the same as 758for the range lock. 759.It Unlock 760specifies that a range lock has been released. The parameters are the same as 761for the range lock. 762.El 763.\" XXX 764.Pp 765.It Xo 766.Ic init 767.Op Fl S Ar size 768.Op Fl w 769.Ar plex | subdisk 770.Xc 771.Nm Ic init 772initializes a subdisk by writing zeroes to it. You can initialize all subdisks 773in a plex by specifying the plex name. This is the only way to ensure 774consistent data in a plex. You must perform this initialization before using a 775RAID-5 plex. It is also recommended for other new plexes. 776.Nm 777initializes all subdisks of a plex in parallel. Since this operation can take a 778long time, it is normally performed in the background. If you want to wait for 779completion of the command, use the 780.Fl w 781(wait) option. 782.Pp 783Specify the 784.Fl S 785option if you want to write blocks of a different size from the default value of 78616 kB. 787.Nm 788prints a console message when the initialization is complete. 789.Pp 790.It Ic label Ar volume 791The 792.Ic label 793command writes a 794.Em ufs 795style volume label on a volume. It is a simple alternative to an appropriate 796call to 797.Ic disklabel . 798This is needed because some 799.Em ufs 800commands still read the disk to find the label instead of using the correct 801.Xr ioctl 2 802call to access it. 803.Nm 804maintains a volume label separately from the volume data, so this command is not 805needed for 806.Xr newfs 8 . 807This command is deprecated. 808.Pp 809.It Xo 810.Ic list 811.Op Fl r 812.Op Fl V 813.Op Ar volume | plex | subdisk 814.Xc 815.It Xo 816.Ic l 817.Op Fl r 818.Op Fl V 819.Op Ar volume | plex | subdisk 820.Xc 821.It Xo 822.Ic ld 823.Op Fl r 824.Op Fl s 825.Op Fl v 826.Op Fl V 827.Op Ar volume 828.Xc 829.It Xo 830.Ic ls 831.Op Fl r 832.Op Fl s 833.Op Fl v 834.Op Fl V 835.Op Ar subdisk 836.Xc 837.It Xo 838.Ic lp 839.Op Fl r 840.Op Fl s 841.Op Fl v 842.Op Fl V 843.Op Ar plex 844.Xc 845.It Xo 846.Ic lv 847.Op Fl r 848.Op Fl s 849.Op Fl v 850.Op Fl V 851.Op Ar volume 852.Xc 853.Ic list 854is used to show information about the specified object. If the argument is 855omitted, information is shown about all objects known to 856.Nm . 857The 858.Ic l 859command is a synonym for 860.Ic list . 861.Pp 862The 863.Fl r 864option relates to volumes and plexes: if specified, it recursively lists 865information for the subdisks and (for a volume) plexes subordinate to the 866objects. The commands 867.Ic lv , lp , ls 868and 869.Ic ld 870list only volumes, plexes, subdisks and drives respectively. This is 871particularly useful when used without parameters. 872.Pp 873The 874.Fl s 875option causes 876.Nm 877to output device statistics, the 878.Fl v 879(verbose) option causes some additional information to be output, and the 880.Fl V 881causes considerable additional information to be output. 882.Pp 883.It Ic makedev 884The 885.Ic makedev 886command removes the directory 887.Pa /dev/vinum 888and recreates it with device nodes 889which reflect the current configuration. This command is not intended for 890general use, and is provided for emergency use only. 891.Pp 892.It Xo 893.Ic mirror 894.Op Fl f 895.Op Fl n Ar name 896.Op Fl s 897.Op Fl v 898.Ar drives 899.Xc 900The 901.Ic mirror 902command provides a simplified alternative to the 903.Ic create 904command for creating mirrored volumes. Without any options, it creates a RAID-1 905(mirrored) volume with two concatenated plexes. The largest contiguous space 906available on each drive is used to create the subdisks for the plexes. The 907first plex is built from the odd-numbered drives in the list, and the second 908plex is built from the even-numbered drives. If the drives are of different 909sizes, the plexes will be of different sizes. 910.Pp 911If the 912.Fl s 913option is provided, 914.Ic mirror 915builds striped plexes with a stripe size of 256 kB. The size of the subdisks in 916each plex is the size of the smallest contiguous storage available on any of the 917drives which form the plex. Again, the plexes may differ in size. 918.Pp 919Normally, the 920.Ic mirror 921command creates an arbitrary name for the volume and its components. The name 922is composed of the text 923.Dq Li vinum 924and a small integer, for example 925.Dq Li vinum3 . 926You can override this with the 927.Fl n Ar name 928option, which assigns the name specified to the volume. The plexes and subdisks 929are named after the volume in the default manner. 930.Pp 931There is no choice of name for the drives. If the drives have already been 932initialized as 933.Nm 934drives, the name remains. Otherwise the drives are given names starting with 935the text 936.Dq Li vinumdrive 937and a small integer, for example 938.Dq Li vinumdrive7 . 939As with the 940.Ic create 941command, the 942.Fl f 943option can be used to specify that a previous name should be overwritten. The 944.Fl v 945is used to specify verbose output. 946.Pp 947See the section 948.Sx SIMPLIFIED CONFIGURATION 949below for some examples of this 950command. 951.Pp 952.It Ic mv Fl f Ar drive object ... 953.It Ic move Fl f Ar drive object ... 954Move all the subdisks from the specified objects onto the new drive. The 955objects may be subdisks, drives or plexes. When drives or plexes are specified, 956all subdisks associated with the object are moved. 957.Pp 958The 959.Fl f 960option is required for this function, since it currently does not preserve the 961data in the subdisk. This functionality will be added at a later date. In this 962form, however, it is suited to recovering a failed disk drive. 963.Pp 964.It Ic printconfig Op Ar file 965Write a copy of the current configuration to 966.Ar file 967in a format that can be used to recreate the 968.Nm 969configuration. Unlike the configuration saved on disk, it includes definitions 970of the drives. If you omit 971.Ar file , 972.Nm 973writes the list to 974.Dv stdout . 975.Pp 976.It Ic quit 977Exit the 978.Nm 979program when running in interactive mode. Normally this would be done by 980entering the 981.Dv EOF 982character. 983.Pp 984.It Ic read Ar disk ... 985The 986.Ic read 987command scans the specified disks for 988.Nm 989partitions containing previously created configuration information. It reads 990the configuration in order from the most recently updated to least recently 991updated configuration. 992.Nm 993maintains an up-to-date copy of all configuration information on each disk 994partition. You must specify all of the slices in a configuration as the 995parameter to this command. 996.Pp 997The 998.Ic read 999command is intended to selectively load a 1000.Nm 1001configuration on a system which has other 1002.Nm 1003partitions. If you want to start all partitions on the system, it is easier to 1004use the 1005.Ic start 1006command. 1007.Pp 1008If 1009.Nm 1010encounters any errors during this command, it will turn off automatic 1011configuration update to avoid corrupting the copies on disk. This will also 1012happen if the configuration on disk indicates a configuration error (for 1013example, subdisks which do not have a valid space specification). You can turn 1014the updates on again with the 1015.Ic setdaemon 1016and 1017.Ic saveconfig 1018commands. Reset bit 2 (numerical value 4) of the daemon options mask to 1019re-enable configuration saves. 1020.Pp 1021.It Xo 1022.Ic rebuildparity 1023.Ar plex 1024.Op Fl f 1025.Op Fl v 1026.Op Fl V 1027.Xc 1028Rebuild the parity blocks on the specified RAID-4 or RAID-5 plex. This 1029operation maintains a pointer in the plex, so it can be stopped and later 1030restarted from the same position if desired. In addition, this pointer is used 1031by the 1032.Ic checkparity 1033command, so rebuilding the parity blocks need only start at the location where 1034the first parity problem has been detected. 1035.Pp 1036If the 1037.Fl f 1038flag is specified, 1039.Ic rebuildparity 1040starts rebuilding at the beginning of the plex. If the 1041.Fl v 1042flag is specified, 1043.Ic rebuildparity 1044first checks the existing parity blocks prints information about those found to 1045be incorrect before rebuilding. If the 1046.Fl V 1047flag is specified, 1048.Ic rebuildparity 1049prints a running progress report. 1050.Pp 1051.It Xo 1052.Ic rename 1053.Op Fl r 1054.Op Ar drive | subdisk | plex | volume 1055.Ar newname 1056.Xc 1057Change the name of the specified object. If the 1058.Fl r 1059option is specified, subordinate objects will be named by the default rules: 1060plex names will be formed by appending 1061.Li .p Ns Ar number 1062to the volume name, and 1063subdisk names will be formed by appending 1064.Li .s Ns Ar number 1065to the plex name. 1066.\".Pp 1067.\".It Xo 1068.\".Ic replace 1069.\".Ar drive newdrive 1070.\"Move all the subdisks from the specified drive onto the new drive. This will 1071.\"attempt to recover those subdisks that can be recovered, and create the others 1072.\"from scratch. If the new drive lacks the space for this operation, as many 1073.\"subdisks as possible will be fitted onto the drive, and the rest will be left on 1074.\"the original drive. 1075.Pp 1076.It Ic resetconfig 1077The 1078.Ic resetconfig 1079command completely obliterates the 1080.Nm 1081configuration on a system. Use this command only when you want to completely 1082delete the configuration. 1083.Nm 1084will ask for confirmation; you must type in the words 1085.Li "NO FUTURE" 1086exactly as shown: 1087.Bd -unfilled -offset indent 1088.No # Nm Ic resetconfig 1089 1090WARNING! This command will completely wipe out your vinum 1091configuration. All data will be lost. If you really want 1092to do this, enter the text 1093 1094NO FUTURE 1095.No "Enter text ->" Sy "NO FUTURE" 1096Vinum configuration obliterated 1097.Ed 1098.Pp 1099As the message suggests, this is a last-ditch command. Don't use it unless you 1100have an existing configuration which you never want to see again. 1101.Pp 1102.It Xo 1103.Ic resetstats 1104.Op Fl r 1105.Op Ar volume | plex | subdisk 1106.Xc 1107.Nm 1108maintains a number of statistical counters for each object. See the header file 1109.Aq Pa sys/dev/vinumvar.h 1110for more information. 1111.\" XXX put it in here when it's finalized 1112Use the 1113.Ic resetstats 1114command to reset these counters. In conjunction with the 1115.Fl r 1116option, 1117.Nm 1118also resets the counters of subordinate objects. 1119.Pp 1120.It Xo 1121.Ic rm 1122.Op Fl f 1123.Op Fl r 1124.Ar volume | plex | subdisk 1125.Xc 1126.Ic rm 1127removes an object from the 1128.Nm 1129configuration. Once an object has been removed, there is no way to recover it. 1130Normally 1131.Nm 1132performs a large amount of consistency checking before removing an object. The 1133.Fl f 1134option tells 1135.Nm 1136to omit this checking and remove the object anyway. Use this option with great 1137care: it can result in total loss of data on a volume. 1138.Pp 1139Normally, 1140.Nm 1141refuses to remove a volume or plex if it has subordinate plexes or subdisks 1142respectively. You can tell 1143.Nm 1144to remove the object anyway by using the 1145.Fl f 1146option, or you can cause 1147.Nm 1148to remove the subordinate objects as well by using the 1149.Fl r 1150(recursive) option. If you remove a volume with the 1151.Fl r 1152option, it will remove both the plexes and the subdisks which belong to the 1153plexes. 1154.Pp 1155.It Ic saveconfig 1156Save the current configuration to disk. Normally this is not necessary, since 1157.Nm 1158automatically saves any change in configuration. If an error occurs on startup, 1159updates will be disabled. When you reenable them with the 1160.Ic setdaemon 1161command, 1162.Nm 1163does not automatically save the configuration to disk. Use this command to save 1164the configuration. 1165.\".Pp 1166.\".It Xo 1167.\".Ic set 1168.\".Op Fl f 1169.\".Ar state 1170.\".Ar volume | plex | subdisk | disk 1171.\".Xc 1172.\".Ic set 1173.\"sets the state of the specified object to one of the valid states (see 1174.\".Sx OBJECT STATES 1175.\"below). Normally 1176.\".Nm 1177.\"performs a large amount of consistency checking before making the change. The 1178.\".Fl f 1179.\"option tells 1180.\".Nm 1181.\"to omit this checking and perform the change anyway. Use this option with great 1182.\"care: it can result in total loss of data on a volume. 1183.Pp 1184.It Ic setdaemon Op Ar value 1185.Ic setdaemon 1186sets a variable bitmask for the 1187.Nm 1188daemon. This command is temporary and will be replaced. Currently, the bit mask 1189may contain the bits 1 (log every action to syslog) and 4 (don't update 1190configuration). Option bit 4 can be useful for error recovery. 1191.Pp 1192.It Xo 1193.Ic setstate Ar state 1194.Op Ar volume | plex | subdisk | drive 1195.Xc 1196.Ic setstate 1197sets the state of the specified objects to the specified state. This bypasses 1198the usual consistency mechanism of 1199.Nm 1200and should be used only for recovery purposes. It is possible to crash the 1201system by incorrect use of this command. 1202.Pp 1203.It Xo 1204.Ic start 1205.Op Fl i Ar interval 1206.Op Fl S Ar size 1207.Op Fl w 1208.Op Ar plex | subdisk 1209.Xc 1210.Ic start 1211starts (brings into to the 1212.Em up 1213state) one or more 1214.Nm 1215objects. 1216.Pp 1217If no object names are specified, 1218.Nm 1219scans the disks known to the system for 1220.Nm 1221drives and then reads in the configuration as described under the 1222.Ic read 1223commands. The 1224.Nm 1225drive contains a header with all information about the data stored on the drive, 1226including the names of the other drives which are required in order to represent 1227plexes and volumes. 1228.Pp 1229If 1230.Nm 1231encounters any errors during this command, it will turn off automatic 1232configuration update to avoid corrupting the copies on disk. This will also 1233happen if the configuration on disk indicates a configuration error (for 1234example, subdisks which do not have a valid space specification). You can turn 1235the updates on again with the 1236.Ic setdaemon 1237and 1238.Ic saveconfig 1239command. Reset bit 4 of the daemon options mask to re-enable configuration 1240saves. 1241.Pp 1242If object names are specified, 1243.Nm 1244starts them. Normally this operation is only of use with subdisks. The action 1245depends on the current state of the object: 1246.Bl -bullet 1247.It 1248If the object is already in the 1249.Em up 1250state, 1251.Nm 1252does nothing. 1253.It 1254If the object is a subdisk in the 1255.Em down 1256or 1257.Em reborn 1258states, 1259.Nm 1260changes it to the 1261.Em up 1262state. 1263.It 1264If the object is a subdisk in the 1265.Em empty 1266state, the change depends on the subdisk. If it is part of a plex which is part 1267of a volume which contains other plexes, 1268.Nm 1269places the subdisk in the 1270.Em reviving 1271state and attempts to copy the data from the volume. When the operation 1272completes, the subdisk is set into the 1273.Em up 1274state. If it is part of a plex which is part of a volume which contains no 1275other plexes, or if it is not part of a plex, 1276.Nm 1277brings it into the 1278.Em up 1279state immediately. 1280.It 1281If the object is a subdisk in the 1282.Em reviving 1283state, 1284.Nm 1285continues the revive 1286operation offline. When the operation completes, the subdisk is set into the 1287.Em up 1288state. 1289.El 1290.Pp 1291When a subdisk comes into the 1292.Em up 1293state, 1294.Nm 1295automatically checks the state of any plex and volume to which it may belong and 1296changes their state where appropriate. 1297.Pp 1298If the object is a plex, 1299.Ic start 1300checks the state of the subordinate subdisks (and plexes in the case of a 1301volume) and starts any subdisks which can be started. 1302.Pp 1303To start a plex in a multi-plex volume, the data must be copied from another 1304plex in the volume. Since this frequently takes a long time, it is normally 1305done in the background. If you want to wait for this operation to complete (for 1306example, if you are performing this operation in a script), use the 1307.Fl w 1308option. 1309.Pp 1310Copying data doesn't just take a long time, it can also place a significant load 1311on the system. You can specify the transfer size in bytes or sectors with the 1312.Fl S 1313option, and an interval (in milliseconds) to wait between copying each block with 1314the 1315.Fl i 1316option. Both of these options lessen the load on the system. 1317.Pp 1318.It Xo 1319.Ic stop 1320.Op Fl f 1321.Op Ar volume | plex | subdisk 1322.Xc 1323If no parameters are specified, 1324.Ic stop 1325removes the 1326.Nm 1327KLD and stops 1328.Xr vinum 4 . 1329This can only be done if no objects are active. In particular, the 1330.Fl f 1331option does not override this requirement. Normally, the 1332.Ic stop 1333command writes the current configuration back to the drives before terminating. 1334This will not be possible if configuration updates are disabled, so 1335.Nm 1336will not stop if configuration updates are disabled. You can override this by 1337specifying the 1338.Fl f 1339option. 1340.Pp 1341The 1342.Ic stop 1343command can only work if 1344.Nm 1345has been loaded as a KLD, since it is not possible to unload a statically 1346configured driver. 1347.Nm Ic stop 1348will fail if 1349.Nm 1350is statically configured. 1351.Pp 1352If object names are specified, 1353.Ic stop 1354disables access to the objects. If the objects have subordinate objects, they 1355subordinate objects must either already be inactive (stopped or in error), or 1356the 1357.Fl r 1358and 1359.Fl f 1360options must be specified. This command does not remove the objects from the 1361configuration. They can be accessed again after a 1362.Ic start 1363command. 1364.Pp 1365By default, 1366.Nm 1367does not stop active objects. For example, you cannot stop a plex which is 1368attached to an active volume, and you cannot stop a volume which is open. The 1369.Fl f 1370option tells 1371.Nm 1372to omit this checking and remove the object anyway. Use this option with great 1373care and understanding: used incorrectly, it can result in serious data 1374corruption. 1375.Pp 1376.It Xo 1377.Ic stripe 1378.Op Fl f 1379.Op Fl n Ar name 1380.Op Fl v 1381.Ar drives 1382.Xc 1383The 1384.Ic stripe 1385command provides a simplified alternative to the 1386.Ic create 1387command for creating volumes with a single striped plex. The size of the 1388subdisks is the size of the largest contiguous space available on all the 1389specified drives. The stripe size is fixed at 256 kB. 1390.Pp 1391Normally, the 1392.Ic stripe 1393command creates an arbitrary name for the volume and its components. The name 1394is composed of the text 1395.Dq Li vinum 1396and a small integer, for example 1397.Dq Li vinum3 . 1398You can override this with the 1399.Fl n Ar name 1400option, which assigns the name specified to the volume. The plexes and subdisks 1401are named after the volume in the default manner. 1402.Pp 1403There is no choice of name for the drives. If the drives have already been 1404initialized as 1405.Nm 1406drives, the name remains. Otherwise the drives are given names starting with 1407the text 1408.Dq Li vinumdrive 1409and a small integer, for example 1410.Dq Li vinumdrive7 . 1411As with the 1412.Ic create 1413command, the 1414.Fl f 1415option can be used to specify that a previous name should be overwritten. The 1416.Fl v 1417is used to specify verbose output. 1418.Pp 1419See the section 1420.Sx SIMPLIFIED CONFIGURATION 1421below for some examples of this 1422command. 1423.El 1424.Sh SIMPLIFIED CONFIGURATION 1425This section describes a simplified interface to 1426.Nm 1427configuration using the 1428.Ic concat , 1429.Ic mirror 1430and 1431.Ic stripe 1432commands. These commands create convenient configurations for some more normal 1433situations, but they are not as flexible as the 1434.Ic create 1435command. 1436.Pp 1437See above for the description of the commands. Here are some examples, all 1438performed with the same collection of disks. Note that the first drive, 1439.Pa /dev/da1h , 1440is smaller than the others. This has an effect on the sizes chosen for each 1441kind of subdisk. 1442.Pp 1443The following examples all use the 1444.Fl v 1445option to show the commands passed to the system, and also to list the structure 1446of the volume. Without the 1447.Fl v 1448option, these commands produce no output. 1449.Ss Volume with a single concatenated plex 1450Use a volume with a single concatenated plex for the largest possible storage 1451without resilience to drive failures: 1452.Bd -literal 1453vinum -> concat -v /dev/da1h /dev/da2h /dev/da3h /dev/da4h 1454volume vinum0 1455 plex name vinum0.p0 org concat 1456drive vinumdrive0 device /dev/da1h 1457 sd name vinum0.p0.s0 drive vinumdrive0 size 0 1458drive vinumdrive1 device /dev/da2h 1459 sd name vinum0.p0.s1 drive vinumdrive1 size 0 1460drive vinumdrive2 device /dev/da3h 1461 sd name vinum0.p0.s2 drive vinumdrive2 size 0 1462drive vinumdrive3 device /dev/da4h 1463 sd name vinum0.p0.s3 drive vinumdrive3 size 0 1464V vinum0 State: up Plexes: 1 Size: 2134 MB 1465P vinum0.p0 C State: up Subdisks: 4 Size: 2134 MB 1466S vinum0.p0.s0 State: up PO: 0 B Size: 414 MB 1467S vinum0.p0.s1 State: up PO: 414 MB Size: 573 MB 1468S vinum0.p0.s2 State: up PO: 988 MB Size: 573 MB 1469S vinum0.p0.s3 State: up PO: 1561 MB Size: 573 MB 1470.Ed 1471.Pp 1472In this case, the complete space on all four disks was used, giving a volume 14732134 MB in size. 1474.Ss Volume with a single striped plex 1475A volume with a single striped plex may give better performance than a 1476concatenated plex, but restrictions on striped plexes can mean that the volume 1477is smaller. It will also not be resilient to a drive failure: 1478.Bd -literal 1479vinum -> stripe -v /dev/da1h /dev/da2h /dev/da3h /dev/da4h 1480drive vinumdrive0 device /dev/da1h 1481drive vinumdrive1 device /dev/da2h 1482drive vinumdrive2 device /dev/da3h 1483drive vinumdrive3 device /dev/da4h 1484volume vinum0 1485 plex name vinum0.p0 org striped 256k 1486 sd name vinum0.p0.s0 drive vinumdrive0 size 849825b 1487 sd name vinum0.p0.s1 drive vinumdrive1 size 849825b 1488 sd name vinum0.p0.s2 drive vinumdrive2 size 849825b 1489 sd name vinum0.p0.s3 drive vinumdrive3 size 849825b 1490V vinum0 State: up Plexes: 1 Size: 1659 MB 1491P vinum0.p0 S State: up Subdisks: 4 Size: 1659 MB 1492S vinum0.p0.s0 State: up PO: 0 B Size: 414 MB 1493S vinum0.p0.s1 State: up PO: 256 kB Size: 414 MB 1494S vinum0.p0.s2 State: up PO: 512 kB Size: 414 MB 1495S vinum0.p0.s3 State: up PO: 768 kB Size: 414 MB 1496.Ed 1497.Pp 1498In this case, the size of the subdisks has been limited to the smallest 1499available disk, so the resulting volume is only 1659 MB in size. 1500.Ss Mirrored volume with two concatenated plexes 1501For more reliability, use a mirrored, concatenated volume: 1502.Bd -literal 1503vinum -> mirror -v -n mirror /dev/da1h /dev/da2h /dev/da3h /dev/da4h 1504drive vinumdrive0 device /dev/da1h 1505drive vinumdrive1 device /dev/da2h 1506drive vinumdrive2 device /dev/da3h 1507drive vinumdrive3 device /dev/da4h 1508volume mirror setupstate 1509 plex name mirror.p0 org concat 1510 sd name mirror.p0.s0 drive vinumdrive0 size 0b 1511 sd name mirror.p0.s1 drive vinumdrive2 size 0b 1512 plex name mirror.p1 org concat 1513 sd name mirror.p1.s0 drive vinumdrive1 size 0b 1514 sd name mirror.p1.s1 drive vinumdrive3 size 0b 1515V mirror State: up Plexes: 2 Size: 1146 MB 1516P mirror.p0 C State: up Subdisks: 2 Size: 988 MB 1517P mirror.p1 C State: up Subdisks: 2 Size: 1146 MB 1518S mirror.p0.s0 State: up PO: 0 B Size: 414 MB 1519S mirror.p0.s1 State: up PO: 414 MB Size: 573 MB 1520S mirror.p1.s0 State: up PO: 0 B Size: 573 MB 1521S mirror.p1.s1 State: up PO: 573 MB Size: 573 MB 1522.Ed 1523.Pp 1524This example specifies the name of the volume, 1525.Ar mirror . 1526Since one drive is smaller than the others, the two plexes are of different 1527size, and the last 158 MB of the volume is non-resilient. To ensure complete 1528reliability in such a situation, use the 1529.Ic create 1530command to create a volume with 988 MB. 1531.Ss Mirrored volume with two striped plexes 1532Alternatively, use the 1533.Fl s 1534option to create a mirrored volume with two striped plexes: 1535.Bd -literal 1536vinum -> mirror -v -n raid10 -s /dev/da1h /dev/da2h /dev/da3h /dev/da4h 1537drive vinumdrive0 device /dev/da1h 1538drive vinumdrive1 device /dev/da2h 1539drive vinumdrive2 device /dev/da3h 1540drive vinumdrive3 device /dev/da4h 1541volume raid10 setupstate 1542 plex name raid10.p0 org striped 256k 1543 sd name raid10.p0.s0 drive vinumdrive0 size 849825b 1544 sd name raid10.p0.s1 drive vinumdrive2 size 849825b 1545 plex name raid10.p1 org striped 256k 1546 sd name raid10.p1.s0 drive vinumdrive1 size 1173665b 1547 sd name raid10.p1.s1 drive vinumdrive3 size 1173665b 1548V raid10 State: up Plexes: 2 Size: 1146 MB 1549P raid10.p0 S State: up Subdisks: 2 Size: 829 MB 1550P raid10.p1 S State: up Subdisks: 2 Size: 1146 MB 1551S raid10.p0.s0 State: up PO: 0 B Size: 414 MB 1552S raid10.p0.s1 State: up PO: 256 kB Size: 414 MB 1553S raid10.p1.s0 State: up PO: 0 B Size: 573 MB 1554S raid10.p1.s1 State: up PO: 256 kB Size: 573 MB 1555.Ed 1556.Pp 1557In this case, the usable part of the volume is even smaller, since the first 1558plex has shrunken to match the smallest drive. 1559.Sh CONFIGURATION FILE 1560.Nm 1561requires that all parameters to the 1562.Ic create 1563commands must be in a configuration file. Entries in the configuration file 1564define volumes, plexes and subdisks, and may be in free format, except that each 1565entry must be on a single line. 1566.Ss Scale factors 1567Some configuration file parameters specify a size (lengths, stripe sizes). 1568These values can be specified as bytes, or one of the following scale factors 1569may be appended: 1570.Bl -tag -width indent 1571.It s 1572specifies that the value is a number of sectors of 512 bytes. 1573.It k 1574specifies that the value is a number of kilobytes (1024 bytes). 1575.It m 1576specifies that the value is a number of megabytes (1048576 bytes). 1577.It g 1578specifies that the value is a number of gigabytes (1073741824 bytes). 1579.It b 1580is used for compatibility with 1581.Tn VERITAS . 1582It stands for blocks of 512 bytes. 1583This abbreviation is confusing, since the word 1584.Dq block 1585is used in different 1586meanings, and its use is deprecated. 1587.El 1588.Pp 1589For example, the value 16777216 bytes can also be written as 1590.Em 16m , 1591.Em 16384k 1592or 1593.Em 32768s . 1594.Pp 1595The configuration file can contain the following entries: 1596.Bl -tag -width 4n 1597.It Ic drive Ar name devicename Op Ar options 1598Define a drive. The options are: 1599.Bl -tag -width 18n 1600.It Cm device Ar devicename 1601Specify the device on which the drive resides. 1602.Ar devicename 1603must be the name of a disk partition, for example 1604.Pa /dev/da1e 1605or 1606.Pa /dev/ad3s2h , 1607and it must be of type 1608.Em vinum . 1609Do not use the 1610.Dq Li c 1611partition, which is reserved for the complete disk. 1612.It Cm hotspare 1613Define the drive to be a 1614.Dq hot spare 1615drive, which is maintained to automatically replace a failed drive. 1616.Nm 1617does not allow this drive to be used for any other purpose. In particular, it 1618is not possible to create subdisks on it. This functionality has not been 1619completely implemented. 1620.El 1621.It Ic volume Ar name Op Ar options 1622Define a volume with name 1623.Ar name . 1624Options are: 1625.Bl -tag -width 18n 1626.It Cm plex Ar plexname 1627Add the specified plex to the volume. If 1628.Ar plexname 1629is specified as 1630.Cm * , 1631.Nm 1632will look for the definition of the plex as the next possible entry in the 1633configuration file after the definition of the volume. 1634.It Cm readpol Ar policy 1635Define a 1636.Em read policy 1637for the volume. 1638.Ar policy 1639may be either 1640.Cm round 1641or 1642.Cm prefer Ar plexname . 1643.Nm 1644satisfies a read request from only one of the plexes. A 1645.Cm round 1646read policy specifies that each read should be performed from a different plex 1647in 1648.Em round-robin 1649fashion. A 1650.Cm prefer 1651read policy reads from the specified plex every time. 1652.It Cm setupstate 1653When creating a multi-plex volume, assume that the contents of all the plexes 1654are consistent. This is normally not the case, so by default 1655.Nm 1656sets all plexes except the first one to the 1657.Em faulty 1658state. Use the 1659.Ic start 1660command to first bring them to a consistent state. In the case of striped and 1661concatenated plexes, however, it does not normally cause problems to leave them 1662inconsistent: when using a volume for a file system or a swap partition, the 1663previous contents of the disks are not of interest, so they may be ignored. 1664If you want to take this risk, use the 1665.Cm setupstate 1666keyword. It will only apply to the plexes defined immediately after the volume 1667in the configuration file. If you add plexes to a volume at a later time, you 1668must integrate them manually with the 1669.Ic start 1670command. 1671.Pp 1672Note that you 1673.Em must 1674use the 1675.Ic init 1676command with RAID-5 plexes: otherwise extreme data corruption will result if one 1677subdisk fails. 1678.El 1679.It Ic plex Op Ar options 1680Define a plex. Unlike a volume, a plex does not need a name. The options may 1681be: 1682.Bl -tag -width 18n 1683.It Cm name Ar plexname 1684Specify the name of the plex. Note that you must use the keyword 1685.Cm name 1686when naming a plex or subdisk. 1687.It Cm org Ar organization Op Ar stripesize 1688Specify the organization of the plex. 1689.Ar organization 1690can be one of 1691.Cm concat , striped 1692or 1693.Cm raid5 . 1694For 1695.Cm striped 1696and 1697.Cm raid5 1698plexes, the parameter 1699.Ar stripesize 1700must be specified, while for 1701.Cm concat 1702it must be omitted. For type 1703.Cm striped , 1704it specifies the width of each stripe. For type 1705.Cm raid5 , 1706it specifies the size of a group. A group is a portion of a plex which 1707stores the parity bits all in the same subdisk. It must be a factor of the plex size (in 1708other words, the result of dividing the plex size by the stripe size must be an 1709integer), and it must be a multiple of a disk sector (512 bytes). 1710.Pp 1711For optimum performance, stripes should be at least 128 kB in size: anything 1712smaller will result in a significant increase in I/O activity due to mapping of 1713individual requests over multiple disks. The performance improvement due to the 1714increased number of concurrent transfers caused by this mapping will not make up 1715for the performance drop due to the increase in latency. A good guideline for 1716stripe size is between 256 kB and 512 kB. Avoid powers of 2, however: they tend 1717to cause all superblocks to be placed on the first subdisk. 1718.Pp 1719A striped plex must have at least two subdisks (otherwise it is a concatenated 1720plex), and each must be the same size. A RAID-5 plex must have at least three 1721subdisks, and each must be the same size. In practice, a RAID-5 plex should 1722have at least 5 subdisks. 1723.It Cm volume Ar volname 1724Add the plex to the specified volume. If no 1725.Cm volume 1726keyword is specified, the plex will be added to the last volume mentioned in the 1727configuration file. 1728.It Cm sd Ar sdname offset 1729Add the specified subdisk to the plex at offset 1730.Ar offset . 1731.El 1732.It Ic subdisk Op Ar options 1733Define a subdisk. Options may be: 1734.Bl -hang -width 18n 1735.It Cm name Ar name 1736Specify the name of a subdisk. It is not necessary to specify a name for a 1737subdisk, see 1738.Sx OBJECT NAMING 1739above. Note that you must specify the keyword 1740.Cm name 1741if you wish to name a subdisk. 1742.It Cm plexoffset Ar offset 1743Specify the starting offset of the subdisk in the plex. If not specified, 1744.Nm 1745allocates the space immediately after the previous subdisk, if any, or otherwise 1746at the beginning of the plex. 1747.It Cm driveoffset Ar offset 1748Specify the starting offset of the subdisk in the drive. If not specified, 1749.Nm 1750allocates the first contiguous 1751.Ar length 1752bytes of free space on the drive. 1753.It Cm length Ar length 1754Specify the length of the subdisk. This keyword must be specified. There is no 1755default, but the value 0 may be specified to mean 1756.Dq "use the largest available contiguous free area on the drive" . 1757If the drive is empty, this means that the entire drive will be used for the 1758subdisk. 1759.Cm length 1760may be shortened to 1761.Cm len . 1762.It Cm plex Ar plex 1763Specify the plex to which the subdisk belongs. By default, the subdisk belongs 1764to the last plex specified. 1765.It Cm drive Ar drive 1766Specify the drive on which the subdisk resides. By default, the subdisk resides 1767on the last drive specified. 1768.El 1769.El 1770.Sh EXAMPLE CONFIGURATION FILE 1771.Bd -literal 1772# Sample vinum configuration file 1773# 1774# Our drives 1775drive drive1 device /dev/da1h 1776drive drive2 device /dev/da2h 1777drive drive3 device /dev/da3h 1778drive drive4 device /dev/da4h 1779drive drive5 device /dev/da5h 1780drive drive6 device /dev/da6h 1781# A volume with one striped plex 1782volume tinyvol 1783 plex org striped 512b 1784 sd length 64m drive drive2 1785 sd length 64m drive drive4 1786volume stripe 1787 plex org striped 512b 1788 sd length 512m drive drive2 1789 sd length 512m drive drive4 1790# Two plexes 1791volume concat 1792 plex org concat 1793 sd length 100m drive drive2 1794 sd length 50m drive drive4 1795 plex org concat 1796 sd length 150m drive drive4 1797# A volume with one striped plex and one concatenated plex 1798volume strcon 1799 plex org striped 512b 1800 sd length 100m drive drive2 1801 sd length 100m drive drive4 1802 plex org concat 1803 sd length 150m drive drive2 1804 sd length 50m drive drive4 1805# a volume with a RAID-5 and a striped plex 1806# note that the RAID-5 volume is longer by 1807# the length of one subdisk 1808volume vol5 1809 plex org striped 64k 1810 sd length 1000m drive drive2 1811 sd length 1000m drive drive4 1812 plex org raid5 32k 1813 sd length 500m drive drive1 1814 sd length 500m drive drive2 1815 sd length 500m drive drive3 1816 sd length 500m drive drive4 1817 sd length 500m drive drive5 1818.Ed 1819.Sh DRIVE LAYOUT CONSIDERATIONS 1820.Nm 1821drives are currently 1822.Bx 1823disk partitions. They must be of type 1824.Em vinum 1825in order to avoid overwriting data used for other purposes. Use 1826.Nm disklabel Fl e 1827to edit a partition type definition. The following display shows a typical 1828partition layout as shown by 1829.Xr disklabel 8 : 1830.Bd -literal 18318 partitions: 1832# size offset fstype [fsize bsize bps/cpg] 1833 a: 81920 344064 4.2BSD 0 0 0 # (Cyl. 240*- 297*) 1834 b: 262144 81920 swap # (Cyl. 57*- 240*) 1835 c: 4226725 0 unused 0 0 # (Cyl. 0 - 2955*) 1836 e: 81920 0 4.2BSD 0 0 0 # (Cyl. 0 - 57*) 1837 f: 1900000 425984 4.2BSD 0 0 0 # (Cyl. 297*- 1626*) 1838 g: 1900741 2325984 vinum 0 0 0 # (Cyl. 1626*- 2955*) 1839.Ed 1840.Pp 1841In this example, partition 1842.Dq Li g 1843may be used as a 1844.Nm 1845partition. Partitions 1846.Dq Li a , 1847.Dq Li e 1848and 1849.Dq Li f 1850may be used as 1851.Em UFS 1852file systems or 1853.Em ccd 1854partitions. Partition 1855.Dq Li b 1856is a swap partition, and partition 1857.Dq Li c 1858represents the whole disk and should not be used for any other purpose. 1859.Pp 1860.Nm 1861uses the first 265 sectors on each partition for configuration information, so 1862the maximum size of a subdisk is 265 sectors smaller than the drive. 1863.Sh LOG FILE 1864.Nm 1865maintains a log file, by default 1866.Pa /var/tmp/vinum_history , 1867in which it keeps track of the commands issued to 1868.Nm . 1869You can override the name of this file by setting the environment variable 1870.Ev VINUM_HISTORY 1871to the name of the file. 1872.Pp 1873Each message in the log file is preceded by a date. The default format is 1874.Qq Li %e %b %Y %H:%M:%S . 1875See 1876.Xr strftime 3 1877for further details of the format string. It can be overridden by the 1878environment variable 1879.Ev VINUM_DATEFORMAT . 1880.Sh HOW TO SET UP VINUM 1881This section gives practical advice about how to implement a 1882.Nm 1883system. 1884.Ss Where to put the data 1885The first choice you need to make is where to put the data. You need dedicated 1886disk partitions for 1887.Nm . 1888They should be partitions, not devices, and they should not be partition 1889.Dq Li c . 1890For example, good names are 1891.Pa /dev/da0e 1892or 1893.Pa /dev/ad3s4a . 1894Bad names are 1895.Pa /dev/da0 1896and 1897.Pa /dev/da0s1 , 1898both of which represent a device, not a partition, and 1899.Pa /dev/ad1c , 1900which represents a complete disk and should be of type 1901.Em unused . 1902See the example under 1903.Sx DRIVE LAYOUT CONSIDERATIONS 1904above. 1905.Ss Designing volumes 1906The way you set up 1907.Nm 1908volumes depends on your intentions. There are a number of possibilities: 1909.Bl -enum 1910.It 1911You may want to join up a number of small disks to make a reasonable sized file 1912system. For example, if you had five small drives and wanted to use all the 1913space for a single volume, you might write a configuration file like: 1914.Bd -literal -offset indent 1915drive d1 device /dev/da2e 1916drive d2 device /dev/da3e 1917drive d3 device /dev/da4e 1918drive d4 device /dev/da5e 1919drive d5 device /dev/da6e 1920volume bigger 1921 plex org concat 1922 sd length 0 drive d1 1923 sd length 0 drive d2 1924 sd length 0 drive d3 1925 sd length 0 drive d4 1926 sd length 0 drive d5 1927.Ed 1928.Pp 1929In this case, you specify the length of the subdisks as 0, which means 1930.Dq "use the largest area of free space that you can find on the drive" . 1931If the subdisk is the only subdisk on the drive, it will use all available 1932space. 1933.It 1934You want to set up 1935.Nm 1936to obtain additional resilience against disk failures. You have the choice of 1937RAID-1, also called 1938.Dq mirroring , 1939or RAID-5, also called 1940.Dq parity . 1941.Pp 1942To set up mirroring, create multiple plexes in a volume. For example, to create 1943a mirrored volume of 2 GB, you might create the following configuration file: 1944.Bd -literal -offset indent 1945drive d1 device /dev/da2e 1946drive d2 device /dev/da3e 1947volume mirror 1948 plex org concat 1949 sd length 2g drive d1 1950 plex org concat 1951 sd length 2g drive d2 1952.Ed 1953.Pp 1954When creating mirrored drives, it is important to ensure that the data from each 1955plex is on a different physical disk so that 1956.Nm 1957can access the complete address space of the volume even if a drive fails. 1958Note that each plex requires as much data as the complete volume: in this 1959example, the volume has a size of 2 GB, but each plex (and each subdisk) 1960requires 2 GB, so the total disk storage requirement is 4 GB. 1961.Pp 1962To set up RAID-5, create a single plex of type 1963.Cm raid5 . 1964For example, to create an equivalent resilient volume of 2 GB, you might use the 1965following configuration file: 1966.Bd -literal -offset indent 1967drive d1 device /dev/da2e 1968drive d2 device /dev/da3e 1969drive d3 device /dev/da4e 1970drive d4 device /dev/da5e 1971drive d5 device /dev/da6e 1972volume raid 1973 plex org raid5 512k 1974 sd length 512m drive d1 1975 sd length 512m drive d2 1976 sd length 512m drive d3 1977 sd length 512m drive d4 1978 sd length 512m drive d5 1979.Ed 1980.Pp 1981RAID-5 plexes require at least three subdisks, one of which is used for storing 1982parity information and is lost for data storage. The more disks you use, the 1983greater the proportion of the disk storage can be used for data storage. In 1984this example, the total storage usage is 2.5 GB, compared to 4 GB for a mirrored 1985configuration. If you were to use the minimum of only three disks, you would 1986require 3 GB to store the information, for example: 1987.Bd -literal -offset indent 1988drive d1 device /dev/da2e 1989drive d2 device /dev/da3e 1990drive d3 device /dev/da4e 1991volume raid 1992 plex org raid5 512k 1993 sd length 1g drive d1 1994 sd length 1g drive d2 1995 sd length 1g drive d3 1996.Ed 1997.Pp 1998As with creating mirrored drives, it is important to ensure that the data from 1999each subdisk is on a different physical disk so that 2000.Nm 2001can access the complete address space of the volume even if a drive fails. 2002.It 2003You want to set up 2004.Nm 2005to allow more concurrent access to a file system. In many cases, access to a 2006file system is limited by the speed of the disk. By spreading the volume across 2007multiple disks, you can increase the throughput in multi-access environments. 2008This technique shows little or no performance improvement in single-access 2009environments. 2010.Nm 2011uses a technique called 2012.Dq striping , 2013or sometimes RAID-0, to increase this concurrency of access. The name RAID-0 is 2014misleading: striping does not provide any redundancy or additional reliability. 2015In fact, it decreases the reliability, since the failure of a single disk will 2016render the volume useless, and the more disks you have, the more likely it is 2017that one of them will fail. 2018.Pp 2019To implement striping, use a 2020.Cm striped 2021plex: 2022.Bd -literal -offset indent 2023drive d1 device /dev/da2e 2024drive d2 device /dev/da3e 2025drive d3 device /dev/da4e 2026drive d4 device /dev/da5e 2027volume raid 2028 plex org striped 512k 2029 sd length 512m drive d1 2030 sd length 512m drive d2 2031 sd length 512m drive d3 2032 sd length 512m drive d4 2033.Ed 2034.Pp 2035A striped plex must have at least two subdisks, but the increase in performance 2036is greater if you have a larger number of disks. 2037.It 2038You may want to have the best of both worlds and have both resilience and 2039performance. This is sometimes called RAID-10 (a combination of RAID-1 and 2040RAID-0), though again this name is misleading. With 2041.Nm 2042you can do this with the following configuration file: 2043.Bd -literal -offset indent 2044drive d1 device /dev/da2e 2045drive d2 device /dev/da3e 2046drive d3 device /dev/da4e 2047drive d4 device /dev/da5e 2048volume raid setupstate 2049 plex org striped 512k 2050 sd length 512m drive d1 2051 sd length 512m drive d2 2052 sd length 512m drive d3 2053 sd length 512m drive d4 2054 plex org striped 512k 2055 sd length 512m drive d4 2056 sd length 512m drive d3 2057 sd length 512m drive d2 2058 sd length 512m drive d1 2059.Ed 2060.Pp 2061Here the plexes are striped, increasing performance, and there are two of them, 2062increasing reliability. Note that this example shows the subdisks of the second 2063plex in reverse order from the first plex. This is for performance reasons and 2064will be discussed below. In addition, the volume specification includes the 2065keyword 2066.Cm setupstate , 2067which ensures that all plexes are 2068.Em up 2069after creation. 2070.El 2071.Ss Creating the volumes 2072Once you have created your configuration files, start 2073.Nm 2074and create the volumes. In this example, the configuration is in the file 2075.Pa configfile : 2076.Bd -literal -offset 2n 2077# vinum create -v configfile 2078 1: drive d1 device /dev/da2e 2079 2: drive d2 device /dev/da3e 2080 3: volume mirror 2081 4: plex org concat 2082 5: sd length 2g drive d1 2083 6: plex org concat 2084 7: sd length 2g drive d2 2085Configuration summary 2086 2087Drives: 2 (4 configured) 2088Volumes: 1 (4 configured) 2089Plexes: 2 (8 configured) 2090Subdisks: 2 (16 configured) 2091 2092Drive d1: Device /dev/da2e 2093 Created on vinum.lemis.com at Tue Mar 23 12:30:31 1999 2094 Config last updated Tue Mar 23 14:30:32 1999 2095 Size: 60105216000 bytes (57320 MB) 2096 Used: 2147619328 bytes (2048 MB) 2097 Available: 57957596672 bytes (55272 MB) 2098 State: up 2099 Last error: none 2100Drive d2: Device /dev/da3e 2101 Created on vinum.lemis.com at Tue Mar 23 12:30:32 1999 2102 Config last updated Tue Mar 23 14:30:33 1999 2103 Size: 60105216000 bytes (57320 MB) 2104 Used: 2147619328 bytes (2048 MB) 2105 Available: 57957596672 bytes (55272 MB) 2106 State: up 2107 Last error: none 2108 2109Volume mirror: Size: 2147483648 bytes (2048 MB) 2110 State: up 2111 Flags: 2112 2 plexes 2113 Read policy: round robin 2114 2115Plex mirror.p0: Size: 2147483648 bytes (2048 MB) 2116 Subdisks: 1 2117 State: up 2118 Organization: concat 2119 Part of volume mirror 2120Plex mirror.p1: Size: 2147483648 bytes (2048 MB) 2121 Subdisks: 1 2122 State: up 2123 Organization: concat 2124 Part of volume mirror 2125 2126Subdisk mirror.p0.s0: 2127 Size: 2147483648 bytes (2048 MB) 2128 State: up 2129 Plex mirror.p0 at offset 0 2130 2131Subdisk mirror.p1.s0: 2132 Size: 2147483648 bytes (2048 MB) 2133 State: up 2134 Plex mirror.p1 at offset 0 2135.Ed 2136.Pp 2137The 2138.Fl v 2139option tells 2140.Nm 2141to list the file as it configures. Subsequently it lists the current 2142configuration in the same format as the 2143.Ic list Fl v 2144command. 2145.Ss Creating more volumes 2146Once you have created the 2147.Nm 2148volumes, 2149.Nm 2150keeps track of them in its internal configuration files. You do not need to 2151create them again. In particular, if you run the 2152.Ic create 2153command again, you will create additional objects: 2154.Bd -literal 2155# vinum create sampleconfig 2156Configuration summary 2157 2158Drives: 2 (4 configured) 2159Volumes: 1 (4 configured) 2160Plexes: 4 (8 configured) 2161Subdisks: 4 (16 configured) 2162 2163D d1 State: up Device /dev/da2e Avail: 53224/57320 MB (92%) 2164D d2 State: up Device /dev/da3e Avail: 53224/57320 MB (92%) 2165 2166V mirror State: up Plexes: 4 Size: 2048 MB 2167 2168P mirror.p0 C State: up Subdisks: 1 Size: 2048 MB 2169P mirror.p1 C State: up Subdisks: 1 Size: 2048 MB 2170P mirror.p2 C State: up Subdisks: 1 Size: 2048 MB 2171P mirror.p3 C State: up Subdisks: 1 Size: 2048 MB 2172 2173S mirror.p0.s0 State: up PO: 0 B Size: 2048 MB 2174S mirror.p1.s0 State: up PO: 0 B Size: 2048 MB 2175S mirror.p2.s0 State: up PO: 0 B Size: 2048 MB 2176S mirror.p3.s0 State: up PO: 0 B Size: 2048 MB 2177.Ed 2178.Pp 2179As this example (this time with the 2180.Fl f 2181option) shows, re-running the 2182.Ic create 2183has created four new plexes, each with a new subdisk. If you want to add other 2184volumes, create new configuration files for them. They do not need to reference 2185the drives that 2186.Nm 2187already knows about. For example, to create a volume 2188.Pa raid 2189on the four drives 2190.Pa /dev/da1e , /dev/da2e , /dev/da3e 2191and 2192.Pa /dev/da4e , 2193you only need to mention the other two: 2194.Bd -literal -offset indent 2195drive d3 device /dev/da1e 2196drive d4 device /dev/da4e 2197volume raid 2198 plex org raid5 512k 2199 sd size 2g drive d1 2200 sd size 2g drive d2 2201 sd size 2g drive d3 2202 sd size 2g drive d4 2203.Ed 2204.Pp 2205With this configuration file, we get: 2206.Bd -literal 2207# vinum create newconfig 2208Configuration summary 2209 2210Drives: 4 (4 configured) 2211Volumes: 2 (4 configured) 2212Plexes: 5 (8 configured) 2213Subdisks: 8 (16 configured) 2214 2215D d1 State: up Device /dev/da2e Avail: 51176/57320 MB (89%) 2216D d2 State: up Device /dev/da3e Avail: 53220/57320 MB (89%) 2217D d3 State: up Device /dev/da1e Avail: 53224/57320 MB (92%) 2218D d4 State: up Device /dev/da4e Avail: 53224/57320 MB (92%) 2219 2220V mirror State: down Plexes: 4 Size: 2048 MB 2221V raid State: down Plexes: 1 Size: 6144 MB 2222 2223P mirror.p0 C State: init Subdisks: 1 Size: 2048 MB 2224P mirror.p1 C State: init Subdisks: 1 Size: 2048 MB 2225P mirror.p2 C State: init Subdisks: 1 Size: 2048 MB 2226P mirror.p3 C State: init Subdisks: 1 Size: 2048 MB 2227P raid.p0 R5 State: init Subdisks: 4 Size: 6144 MB 2228 2229S mirror.p0.s0 State: up PO: 0 B Size: 2048 MB 2230S mirror.p1.s0 State: up PO: 0 B Size: 2048 MB 2231S mirror.p2.s0 State: up PO: 0 B Size: 2048 MB 2232S mirror.p3.s0 State: up PO: 0 B Size: 2048 MB 2233S raid.p0.s0 State: empty PO: 0 B Size: 2048 MB 2234S raid.p0.s1 State: empty PO: 512 kB Size: 2048 MB 2235S raid.p0.s2 State: empty PO: 1024 kB Size: 2048 MB 2236S raid.p0.s3 State: empty PO: 1536 kB Size: 2048 MB 2237.Ed 2238.Pp 2239Note the size of the RAID-5 plex: it is only 6 GB, although together its 2240components use 8 GB of disk space. This is because the equivalent of one 2241subdisk is used for storing parity data. 2242.Ss Restarting Vinum 2243On rebooting the system, start 2244.Nm 2245with the 2246.Ic start 2247command: 2248.Pp 2249.Dl "# vinum start" 2250.Pp 2251This will start all the 2252.Nm 2253drives in the system. If for some reason you wish to start only some of them, 2254use the 2255.Ic read 2256command. 2257.Ss Performance considerations 2258A number of misconceptions exist about how to set up a RAID array for best 2259performance. In particular, most systems use far too small a stripe size. The 2260following discussion applies to all RAID systems, not just to 2261.Nm . 2262.Pp 2263The 2264.Dx 2265block I/O system issues requests of between .5kB and 128 kB; a 2266typical mix is somewhere round 8 kB. You can't stop any striping system from 2267breaking a request into two physical requests, and if you make the stripe small 2268enough, it can be broken into several. This will result in a significant drop 2269in performance: the decrease in transfer time per disk is offset by the order of 2270magnitude greater increase in latency. 2271.Pp 2272With modern disk sizes and the 2273.Dx 2274I/O system, you can expect to have a 2275reasonably small number of fragmented requests with a stripe size between 256 kB 2276and 512 kB; with correct RAID implementations there is no obvious reason not to 2277increase the size to 2 or 4 MB on a large disk. 2278.Pp 2279When choosing a stripe size, consider that most current UFS file systems have 2280cylinder groups 32 MB in size. If you have a stripe size and number of disks 2281both of which are a power of two, it is probable that all superblocks and inodes 2282will be placed on the same subdisk, which will impact performance significantly. 2283Choose an odd number instead, for example 479 kB. 2284.Pp 2285The easiest way to consider the impact of any transfer in a multi-access system 2286is to look at it from the point of view of the potential bottleneck, the disk 2287subsystem: how much total disk time does the transfer use? 2288Since just about 2289everything is cached, the time relationship between the request and its 2290completion is not so important: the important parameter is the total time that 2291the request keeps the disks active, the time when the disks are not available to 2292perform other transfers. As a result, it doesn't really matter if the transfers 2293are happening at the same time or different times. In practical terms, the time 2294we're looking at is the sum of the total latency (positioning time and 2295rotational latency, or the time it takes for the data to arrive under the disk 2296heads) and the total transfer time. For a given transfer to disks of the same 2297speed, the transfer time depends only on the total size of the transfer. 2298.Pp 2299Consider a typical news article or web page of 24 kB, which will probably be 2300read in a single I/O. Take disks with a transfer rate of 6 MB/s and an average 2301positioning time of 8 ms, and a file system with 4 kB blocks. Since it's 24 kB, 2302we don't have to worry about fragments, so the file will start on a 4 kB 2303boundary. The number of transfers required depends on where the block starts: 2304it's (S + F - 1) / S, where S is the stripe size in file system blocks, and F is 2305the file size in file system blocks. 2306.Bl -enum 2307.It 2308Stripe size of 4 kB. You'll have 6 transfers. Total subsystem load: 48 ms 2309latency, 2 ms transfer, 50 ms total. 2310.It 2311Stripe size of 8 kB. On average, you'll have 3.5 transfers. Total subsystem 2312load: 28 ms latency, 2 ms transfer, 30 ms total. 2313.It 2314Stripe size of 16 kB. On average, you'll have 2.25 transfers. Total subsystem 2315load: 18 ms latency, 2 ms transfer, 20 ms total. 2316.It 2317Stripe size of 256 kB. On average, you'll have 1.08 transfers. Total subsystem 2318load: 8.6 ms latency, 2 ms transfer, 10.6 ms total. 2319.It 2320Stripe size of 4 MB. On average, you'll have 1.0009 transfers. Total subsystem 2321load: 8.01 ms latency, 2 ms transfer, 10.01 ms total. 2322.El 2323.Pp 2324It appears that some hardware RAID systems have problems with large stripes: 2325they appear to always transfer a complete stripe to or from disk, so that a 2326large stripe size will have an adverse effect on performance. 2327.Nm 2328does not suffer from this problem: it optimizes all disk transfers and does not 2329transfer unneeded data. 2330.Pp 2331Note that no well-known benchmark program tests true multi-access conditions 2332(more than 100 concurrent users), so it is difficult to demonstrate the validity 2333of these statements. 2334.Pp 2335Given these considerations, the following factors affect the performance of a 2336.Nm 2337volume: 2338.Bl -bullet 2339.It 2340Striping improves performance for multiple access only, since it increases the 2341chance of individual requests being on different drives. 2342.It 2343Concatenating UFS file systems across multiple drives can also improve 2344performance for multiple file access, since UFS divides a file system into 2345cylinder groups and attempts to keep files in a single cylinder group. In 2346general, it is not as effective as striping. 2347.It 2348Mirroring can improve multi-access performance for reads, since by default 2349.Nm 2350issues consecutive reads to consecutive plexes. 2351.It 2352Mirroring decreases performance for all writes, whether multi-access or single 2353access, since the data must be written to both plexes. This explains the 2354subdisk layout in the example of a mirroring configuration above: if the 2355corresponding subdisk in each plex is on a different physical disk, the write 2356commands can be issued in parallel, whereas if they are on the same physical 2357disk, they will be performed sequentially. 2358.It 2359RAID-5 reads have essentially the same considerations as striped reads, unless 2360the striped plex is part of a mirrored volume, in which case the performance of 2361the mirrored volume will be better. 2362.It 2363RAID-5 writes are approximately 25% of the speed of striped writes: to perform 2364the write, 2365.Nm 2366must first read the data block and the corresponding parity block, perform some 2367calculations and write back the parity block and the data block, four times as 2368many transfers as for writing a striped plex. On the other hand, this is offset 2369by the cost of mirroring, so writes to a volume with a single RAID-5 plex are 2370approximately half the speed of writes to a correctly configured volume with two 2371striped plexes. 2372.It 2373When the 2374.Nm 2375configuration changes (for example, adding or removing objects, or the change of 2376state of one of the objects), 2377.Nm 2378writes up to 128 kB of updated configuration to each drive. The larger the 2379number of drives, the longer this takes. 2380.El 2381.Ss Creating file systems on Vinum volumes 2382You do not need to run 2383.Xr disklabel 8 2384before creating a file system on a 2385.Nm 2386volume. Just run 2387.Xr newfs 8 . 2388Use the 2389.Fl v 2390option to state that the device is not divided into partitions. For example, to 2391create a file system on volume 2392.Pa mirror , 2393enter the following command: 2394.Pp 2395.Dl "# newfs -v /dev/vinum/mirror" 2396.Pp 2397A number of other considerations apply to 2398.Nm 2399configuration: 2400.Bl -bullet 2401.It 2402There is no advantage in creating multiple drives on a single disk. Each drive 2403uses 131.5 kB of data for label and configuration information, and performance 2404will suffer when the configuration changes. Use appropriately sized subdisks instead. 2405.It 2406It is possible to increase the size of a concatenated 2407.Nm 2408plex, but currently the size of striped and RAID-5 plexes cannot be increased. 2409Currently the size of an existing UFS file system also cannot be increased, but 2410it is planned to make both plexes and file systems extensible. 2411.El 2412.Sh STATE MANAGEMENT 2413Vinum objects have the concept of 2414.Em state . 2415See 2416.Xr vinum 4 2417for more details. They are only completely accessible if their state is 2418.Em up . 2419To change an object state to 2420.Em up , 2421use the 2422.Ic start 2423command. To change an object state to 2424.Em down , 2425use the 2426.Ic stop 2427command. Normally other states are created automatically by the relationship 2428between objects. For example, if you add a plex to a volume, the subdisks of 2429the plex will be set in the 2430.Em empty 2431state, indicating that, though the hardware is accessible, the data on the 2432subdisk is invalid. As a result of this state, the plex will be set in the 2433.Em faulty 2434state. 2435.Ss The `reviving' state 2436In many cases, when you start a subdisk the system must copy data to the 2437subdisk. Depending on the size of the subdisk, this can take a long time. 2438During this time, the subdisk is set in the 2439.Em reviving 2440state. On successful completion of the copy operation, it is automatically set 2441to the 2442.Em up 2443state. It is possible for the process performing the revive to be stopped and 2444restarted. The system keeps track of how far the subdisk has been revived, and 2445when the 2446.Ic start 2447command is reissued, the copying continues from this point. 2448.Pp 2449In order to maintain the consistency of a volume while one or more of its plexes 2450is being revived, 2451.Nm 2452writes to subdisks which have been revived up to the point of the write. It may 2453also read from the plex if the area being read has already been revived. 2454.Sh GOTCHAS 2455The following points are not bugs, and they have good reasons for existing, but 2456they have shown to cause confusion. Each is discussed in the appropriate 2457section above. 2458.Bl -enum 2459.It 2460.Nm 2461drives are 2462.Ux 2463disk partitions and must have the partition type 2464.Em vinum . 2465This is different from 2466.Xr ccd 4 , 2467which expects partitions of type 2468.Em 4.2BSD . 2469This behaviour of 2470.Nm ccd 2471is an invitation to shoot yourself in the foot: with 2472.Nm ccd 2473you can easily overwrite a file system. 2474.Nm 2475will not permit this. 2476.Pp 2477For similar reasons, the 2478.Nm Ic start 2479command will not accept a drive on partition 2480.Dq Li c . 2481Partition 2482.Dq Li c 2483is used by the system to represent the whole disk, and must be of type 2484.Em unused . 2485Clearly there is a conflict here, which 2486.Nm 2487resolves by not using the 2488.Dq Li c 2489partition. 2490.It 2491When you create a volume with multiple plexes, 2492.Nm 2493does not automatically initialize the plexes. This means that the contents are 2494not known, but they are certainly not consistent. As a result, by default 2495.Nm 2496sets the state of all newly-created plexes except the first to 2497.Em faulty . 2498In order to synchronize them with the first plex, you must 2499.Ic start 2500them, which causes 2501.Nm 2502to copy the data from a plex which is in the 2503.Em up 2504state. Depending on the size of the subdisks involved, this can take a long 2505time. 2506.Pp 2507In practice, people aren't too interested in what was in the plex when it was 2508created, and other volume managers cheat by setting them 2509.Em up 2510anyway. 2511.Nm 2512provides two ways to ensure that newly created plexes are 2513.Em up : 2514.Bl -bullet 2515.It 2516Create the plexes and then synchronize them with 2517.Nm Ic start . 2518.It 2519Create the volume (not the plex) with the keyword 2520.Cm setupstate , 2521which tells 2522.Nm 2523to ignore any possible inconsistency and set the plexes to be 2524.Em up . 2525.El 2526.It 2527Some of the commands currently supported by 2528.Nm 2529are not really needed. For reasons which I don't understand, however, I find 2530that users frequently try the 2531.Ic label 2532and 2533.Ic resetconfig 2534commands, though especially 2535.Ic resetconfig 2536outputs all sort of dire warnings. Don't use these commands unless you have a 2537good reason to do so. 2538.It 2539Some state transitions are not very intuitive. In fact, it's not clear whether 2540this is a bug or a feature. If you find that you can't start an object in some 2541strange state, such as a 2542.Em reborn 2543subdisk, try first to get it into 2544.Em stopped 2545state, with the 2546.Ic stop 2547or 2548.Ic stop Fl f 2549commands. If that works, you should then be able to start it. If you find 2550that this is the only way to get out of a position where easier methods fail, 2551please report the situation. 2552.It 2553If you build the kernel module with the 2554.Fl D Ns Dv VINUMDEBUG 2555option, you must also build 2556.Nm 2557with the 2558.Fl D Ns Dv VINUMDEBUG 2559option, since the size of some data objects used by both components depends on 2560this option. If you don't do so, commands will fail with the message 2561.Sy Invalid argument , 2562and a console message will be logged such as 2563.Bl -diag 2564.It "vinumioctl: invalid ioctl from process 247 (vinum): c0e44642" 2565.El 2566.Pp 2567This error may also occur if you use old versions of KLD or userland program. 2568.It 2569The 2570.Nm Ic read 2571command has a particularly emetic syntax. Once it was the only way to start 2572.Nm , 2573but now the preferred method is with 2574.Nm Ic start . 2575.Nm Ic read 2576should be used for maintenance purposes only. Note that its syntax has changed, 2577and the arguments must be disk slices, such as 2578.Pa /dev/da0 , 2579not partitions such as 2580.Pa /dev/da0e . 2581.El 2582.\"XXX.Sh BUGS 2583.Sh FILES 2584.Bl -tag -width /dev/vinum/control -compact 2585.It Pa /dev/vinum 2586directory with device nodes for 2587.Nm 2588objects 2589.It Pa /dev/vinum/control 2590control device for 2591.Nm 2592.It Pa /dev/vinum/plex 2593directory containing device nodes for 2594.Nm 2595plexes 2596.It Pa /dev/vinum/sd 2597directory containing device nodes for 2598.Nm 2599subdisks 2600.El 2601.Sh ENVIRONMENT 2602.Bl -tag -width VINUM_DATEFORMAT 2603.It Ev VINUM_HISTORY 2604The name of the log file, by default 2605.Pa /var/log/vinum_history . 2606.It Ev VINUM_DATEFORMAT 2607The format of dates in the log file, by default 2608.Qq Li %e %b %Y %H:%M:%S . 2609.It Ev EDITOR 2610The name of the editor to use for editing configuration files, by default 2611.Nm vi . 2612.El 2613.Sh SEE ALSO 2614.Xr strftime 3 , 2615.Xr vinum 4 , 2616.Xr disklabel 8 , 2617.Xr newfs 8 2618.Pp 2619.Pa http://www.vinumvm.org/vinum/ , 2620.Pa http://www.vinumvm.org/vinum/how-to-debug.html . 2621.Sh AUTHORS 2622.An Greg Lehey Aq grog@lemis.com 2623.Sh HISTORY 2624The 2625.Nm 2626command first appeared in 2627.Fx 3.0 . 2628The RAID-5 component of 2629.Nm 2630was developed for Cybernet Inc.\& 2631.Pq Pa www.cybernet.com 2632for its NetMAX product. 2633