1.\" $NetBSD: raid.4,v 1.36 2009/05/04 20:37:07 wiz Exp $ 2.\" 3.\" Copyright (c) 1998 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Greg Oster 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 21.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 28.\" POSSIBILITY OF SUCH DAMAGE. 29.\" 30.\" 31.\" Copyright (c) 1995 Carnegie-Mellon University. 32.\" All rights reserved. 33.\" 34.\" Author: Mark Holland 35.\" 36.\" Permission to use, copy, modify and distribute this software and 37.\" its documentation is hereby granted, provided that both the copyright 38.\" notice and this permission notice appear in all copies of the 39.\" software, derivative works or modified versions, and any portions 40.\" thereof, and that both notices appear in supporting documentation. 41.\" 42.\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" 43.\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND 44.\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. 45.\" 46.\" Carnegie Mellon requests users of this software to return to 47.\" 48.\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU 49.\" School of Computer Science 50.\" Carnegie Mellon University 51.\" Pittsburgh PA 15213-3890 52.\" 53.\" any improvements or extensions that they make and grant Carnegie the 54.\" rights to redistribute these changes. 55.\" 56.Dd August 6, 2007 57.Dt RAID 4 58.Os 59.Sh NAME 60.Nm raid 61.Nd RAIDframe disk driver 62.Sh SYNOPSIS 63.Cd options RAID_AUTOCONFIG 64.Cd options RAID_DIAGNOSTIC 65.Cd options RF_ACC_TRACE=n 66.Cd options RF_DEBUG_MAP=n 67.Cd options RF_DEBUG_PSS=n 68.Cd options RF_DEBUG_QUEUE=n 69.Cd options RF_DEBUG_QUIESCE=n 70.Cd options RF_DEBUG_RECON=n 71.Cd options RF_DEBUG_STRIPELOCK=n 72.Cd options RF_DEBUG_VALIDATE_DAG=n 73.Cd options RF_DEBUG_VERIFYPARITY=n 74.Cd options RF_INCLUDE_CHAINDECLUSTER=n 75.Cd options RF_INCLUDE_EVENODD=n 76.Cd options RF_INCLUDE_INTERDECLUSTER=n 77.Cd options RF_INCLUDE_PARITY_DECLUSTERING=n 78.Cd options RF_INCLUDE_PARITY_DECLUSTERING_DS=n 79.Cd options RF_INCLUDE_PARITYLOGGING=n 80.Cd options RF_INCLUDE_RAID5_RS=n 81.Pp 82.Cd "pseudo-device raid" Op Ar count 83.Sh DESCRIPTION 84The 85.Nm 86driver provides RAID 0, 1, 4, and 5 (and more!) capabilities to 87.Nx . 88This 89document assumes that the reader has at least some familiarity with RAID 90and RAID concepts. 91The reader is also assumed to know how to configure 92disks and pseudo-devices into kernels, how to generate kernels, and how 93to partition disks. 94.Pp 95RAIDframe provides a number of different RAID levels including: 96.Bl -tag -width indent 97.It RAID 0 98provides simple data striping across the components. 99.It RAID 1 100provides mirroring. 101.It RAID 4 102provides data striping across the components, with parity 103stored on a dedicated drive (in this case, the last component). 104.It RAID 5 105provides data striping across the components, with parity 106distributed across all the components. 107.El 108.Pp 109There are a wide variety of other RAID levels supported by RAIDframe. 110The configuration file options to enable them are briefly outlined 111at the end of this section. 112.Pp 113Depending on the parity level configured, the device driver can 114support the failure of component drives. 115The number of failures 116allowed depends on the parity level selected. 117If the driver is able 118to handle drive failures, and a drive does fail, then the system is 119operating in "degraded mode". 120In this mode, all missing data must be 121reconstructed from the data and parity present on the other 122components. 123This results in much slower data accesses, but 124does mean that a failure need not bring the system to a complete halt. 125.Pp 126The RAID driver supports and enforces the use of 127.Sq component labels . 128A 129.Sq component label 130contains important information about the component, including a 131user-specified serial number, the row and column of that component in 132the RAID set, and whether the data (and parity) on the component is 133.Sq clean . 134The component label currently lives at the half-way point of the 135.Sq reserved section 136located at the beginning of each component. 137This 138.Sq reserved section 139is RF_PROTECTED_SECTORS in length (64 blocks or 32Kbytes) and the 140component label is currently 1Kbyte in size. 141.Pp 142If the driver determines that the component labels are very inconsistent with 143respect to each other (e.g. two or more serial numbers do not match) 144or that the component label is not consistent with its assigned place 145in the set (e.g. the component label claims the component should be 146the 3rd one in a 6-disk set, but the RAID set has it as the 3rd component 147in a 5-disk set) then the device will fail to configure. 148If the 149driver determines that exactly one component label seems to be 150incorrect, and the RAID set is being configured as a set that supports 151a single failure, then the RAID set will be allowed to configure, but 152the incorrectly labeled component will be marked as 153.Sq failed , 154and the RAID set will begin operation in degraded mode. 155If all of the components are consistent among themselves, the RAID set 156will configure normally. 157.Pp 158Component labels are also used to support the auto-detection and 159autoconfiguration of RAID sets. 160A RAID set can be flagged as 161autoconfigurable, in which case it will be configured automatically 162during the kernel boot process. 163RAID file systems which are 164automatically configured are also eligible to be the root file system. 165There is currently only limited support (alpha, amd64, i386, pmax, 166sparc, sparc64, and vax architectures) 167for booting a kernel directly from a RAID 1 set, and no support for 168booting from any other RAID sets. 169To use a RAID set as the root 170file system, a kernel is usually obtained from a small non-RAID 171partition, after which any autoconfiguring RAID set can be used for the 172root file system. 173See 174.Xr raidctl 8 175for more information on autoconfiguration of RAID sets. 176Note that with autoconfiguration of RAID sets, it is no longer 177necessary to hard-code SCSI IDs of drives. 178The autoconfiguration code will 179correctly configure a device even after any number of the components 180have had their device IDs changed or device names changed. 181.Pp 182The driver supports 183.Sq hot spares , 184disks which are on-line, but are not 185actively used in an existing file system. 186Should a disk fail, the 187driver is capable of reconstructing the failed disk onto a hot spare 188or back onto a replacement drive. 189If the components are hot swappable, the failed disk can then be 190removed, a new disk put in its place, and a copyback operation 191performed. 192The copyback operation, as its name indicates, will copy 193the reconstructed data from the hot spare to the previously failed 194(and now replaced) disk. 195Hot spares can also be hot-added using 196.Xr raidctl 8 . 197.Pp 198If a component cannot be detected when the RAID device is configured, 199that component will be simply marked as 'failed'. 200.Pp 201The user-land utility for doing all 202.Nm 203configuration and other operations 204is 205.Xr raidctl 8 . 206Most importantly, 207.Xr raidctl 8 208must be used with the 209.Fl i 210option to initialize all RAID sets. 211In particular, this 212initialization includes re-building the parity data. 213This rebuilding 214of parity data is also required when either a) a new RAID device is 215brought up for the first time or b) after an un-clean shutdown of a 216RAID device. 217By using the 218.Fl P 219option to 220.Xr raidctl 8 , 221and performing this on-demand recomputation of all parity 222before doing a 223.Xr fsck 8 224or a 225.Xr newfs 8 , 226file system integrity and parity integrity can be ensured. 227It bears repeating again that parity recomputation is 228.Ar required 229before any file systems are created or used on the RAID device. 230If the 231parity is not correct, then missing data cannot be correctly recovered. 232.Pp 233RAID levels may be combined in a hierarchical fashion. 234For example, a RAID 0 235device can be constructed out of a number of RAID 5 devices (which, in turn, 236may be constructed out of the physical disks, or of other RAID devices). 237.Pp 238The first step to using the 239.Nm 240driver is to ensure that it is suitably configured in the kernel. 241This is done by adding a line similar to: 242.Bd -unfilled -offset indent 243pseudo-device raid 4 # RAIDframe disk device 244.Ed 245.Pp 246to the kernel configuration file. 247The 248.Sq count 249argument 250.Sq ( 4 , 251in this case), specifies the number of RAIDframe drivers to configure. 252To turn on component auto-detection and autoconfiguration of RAID 253sets, simply add: 254.Bd -unfilled -offset indent 255options RAID_AUTOCONFIG 256.Ed 257.Pp 258to the kernel configuration file. 259.Pp 260All component partitions must be of the type 261.Dv FS_BSDFFS 262(e.g. 4.2BSD) or 263.Dv FS_RAID . 264The use of the latter is strongly encouraged, and is required if 265autoconfiguration of the RAID set is desired. 266Since RAIDframe leaves 267room for disklabels, RAID components can be simply raw disks, or 268partitions which use an entire disk. 269.Pp 270A more detailed treatment of actually using a 271.Nm 272device is found in 273.Xr raidctl 8 . 274It is highly recommended that the steps to reconstruct, copyback, and 275re-compute parity are well understood by the system administrator(s) 276.Ar before 277a component failure. 278Doing the wrong thing when a component fails may 279result in data loss. 280.Pp 281Additional internal consistency checking can be enabled by specifying: 282.Bd -unfilled -offset indent 283options RAID_DIAGNOSTIC 284.Ed 285.Pp 286These assertions are disabled by default in order to improve 287performance. 288.Pp 289RAIDframe supports an access tracing facility for tracking both 290requests made and performance of various parts of the RAID systems 291as the request is processed. 292To enable this tracing the following option may be specified: 293.Bd -unfilled -offset indent 294options RF_ACC_TRACE=1 295.Ed 296.Pp 297For extensive debugging there are a number of kernel options which 298will aid in performing extra diagnosis of various parts of the 299RAIDframe sub-systems. 300Note that in order to make full use of these options it is often 301necessary to enable one or more debugging options as listed in 302.Pa src/sys/dev/raidframe/rf_options.h . 303As well, these options are also only typically useful for people who wish 304to debug various parts of RAIDframe. 305The options include: 306.Pp 307For debugging the code which maps RAID addresses to physical 308addresses: 309.Bd -unfilled -offset indent 310options RF_DEBUG_MAP=1 311.Ed 312.Pp 313Parity stripe status debugging is enabled with: 314.Bd -unfilled -offset indent 315options RF_DEBUG_PSS=1 316.Ed 317.Pp 318Additional debugging for queuing is enabled with: 319.Bd -unfilled -offset indent 320options RF_DEBUG_QUEUE=1 321.Ed 322.Pp 323Problems with non-quiescent file systems should be easier to debug if 324the following is enabled: 325.Bd -unfilled -offset indent 326options RF_DEBUG_QUIESCE=1 327.Ed 328.Pp 329Stripelock debugging is enabled with: 330.Bd -unfilled -offset indent 331options RF_DEBUG_STRIPELOCK=1 332.Ed 333.Pp 334Additional diagnostic checks during reconstruction are enabled with: 335.Bd -unfilled -offset indent 336options RF_DEBUG_RECON=1 337.Ed 338.Pp 339Validation of the DAGs (Directed Acyclic Graphs) used to describe an 340I/O access can be performed when the following is enabled: 341.Bd -unfilled -offset indent 342options RF_DEBUG_VALIDATE_DAG=1 343.Ed 344.Pp 345Additional diagnostics during parity verification are enabled with: 346.Bd -unfilled -offset indent 347options RF_DEBUG_VERIFYPARITY=1 348.Ed 349.Pp 350There are a number of less commonly used RAID levels supported by 351RAIDframe. 352These additional RAID types should be considered experimental, and 353may not be ready for production use. 354The various types and the options to enable them are shown here: 355.Pp 356For Even-Odd parity: 357.Bd -unfilled -offset indent 358options RF_INCLUDE_EVENODD=1 359.Ed 360.Pp 361For RAID level 5 with rotated sparing: 362.Bd -unfilled -offset indent 363options RF_INCLUDE_RAID5_RS=1 364.Ed 365.Pp 366For Parity Logging (highly experimental): 367.Bd -unfilled -offset indent 368options RF_INCLUDE_PARITYLOGGING=1 369.Ed 370.Pp 371For Chain Declustering: 372.Bd -unfilled -offset indent 373options RF_INCLUDE_CHAINDECLUSTER=1 374.Ed 375.Pp 376For Interleaved Declustering: 377.Bd -unfilled -offset indent 378options RF_INCLUDE_INTERDECLUSTER=1 379.Ed 380.Pp 381For Parity Declustering: 382.Bd -unfilled -offset indent 383options RF_INCLUDE_PARITY_DECLUSTERING=1 384.Ed 385.Pp 386For Parity Declustering with Distributed Spares: 387.Bd -unfilled -offset indent 388options RF_INCLUDE_PARITY_DECLUSTERING_DS=1 389.Ed 390.Pp 391The reader is referred to the RAIDframe documentation mentioned in the 392.Sx HISTORY 393section for more detail on these various RAID configurations. 394.Sh WARNINGS 395Certain RAID levels (1, 4, 5, 6, and others) can protect against some 396data loss due to component failure. 397However the loss of two 398components of a RAID 4 or 5 system, or the loss of a single component 399of a RAID 0 system, will result in the entire file systems on that RAID 400device being lost. 401RAID is 402.Ar NOT 403a substitute for good backup practices. 404.Pp 405Recomputation of parity 406.Ar MUST 407be performed whenever there is a chance that it may have been 408compromised. 409This includes after system crashes, or before a RAID 410device has been used for the first time. 411Failure to keep parity 412correct will be catastrophic should a component ever fail -- it is 413better to use RAID 0 and get the additional space and speed, than it 414is to use parity, but not keep the parity correct. 415At least with RAID 4160 there is no perception of increased data security. 417.Sh FILES 418.Bl -tag -width /dev/XXrXraidX -compact 419.It Pa /dev/{,r}raid* 420.Nm 421device special files. 422.El 423.Sh SEE ALSO 424.Xr config 1 , 425.Xr sd 4 , 426.Xr fsck 8 , 427.Xr MAKEDEV 8 , 428.Xr mount 8 , 429.Xr newfs 8 , 430.Xr raidctl 8 431.Sh HISTORY 432The 433.Nm 434driver in 435.Nx 436is a port of RAIDframe, a framework for rapid prototyping of RAID 437structures developed by the folks at the Parallel Data Laboratory at 438Carnegie Mellon University (CMU). 439RAIDframe, as originally distributed 440by CMU, provides a RAID simulator for a number of different 441architectures, and a user-level device driver and a kernel device 442driver for Digital Unix. 443The 444.Nm 445driver is a kernelized version of RAIDframe v1.1. 446.Pp 447A more complete description of the internals and functionality of 448RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool 449for RAID Systems", by William V. Courtright II, Garth Gibson, Mark 450Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the 451Parallel Data Laboratory of Carnegie Mellon University. 452The 453.Nm 454driver first appeared in 455.Nx 1.4 . 456.Sh COPYRIGHT 457.Bd -unfilled 458The RAIDframe Copyright is as follows: 459.Pp 460Copyright (c) 1994-1996 Carnegie-Mellon University. 461All rights reserved. 462.Pp 463Permission to use, copy, modify and distribute this software and 464its documentation is hereby granted, provided that both the copyright 465notice and this permission notice appear in all copies of the 466software, derivative works or modified versions, and any portions 467thereof, and that both notices appear in supporting documentation. 468.Pp 469CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" 470CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND 471FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. 472.Pp 473Carnegie Mellon requests users of this software to return to 474.Pp 475 Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU 476 School of Computer Science 477 Carnegie Mellon University 478 Pittsburgh PA 15213-3890 479.Pp 480any improvements or extensions that they make and grant Carnegie the 481rights to redistribute these changes. 482.Ed 483