1.\" Copyright (c) 1980, 1987, 1991, 1993 2.\" The Regents of the University of California. All rights reserved. 3.\" 4.\" %sccs.include.redist.man% 5.\" 6.\" @(#)uda.4 8.1 (Berkeley) 06/05/93 7.\" 8.Dd 9.Dt UDA 4 vax 10.Os BSD 4 11.Sh NAME 12.Nm uda 13.Nd 14.Tn UDA50 15disk controller interface 16.Sh SYNOPSIS 17.Cd "controller uda0 at uba0 csr 0172150 vector udaintr" 18.Cd "disk ra0 at uda0 drive 0" 19.Cd "options MSCP_PARANOIA" 20.Sh DESCRIPTION 21This is a driver for the 22.Tn DEC UDA50 23disk controller and other 24compatible controllers. The 25.Tn UDA50 26communicates with the host through 27a packet protocol known as the Mass Storage Control Protocol 28.Pq Tn MSCP . 29Consult the file 30.Aq Pa vax/mscp.h 31for a detailed description of this protocol. 32.Pp 33The 34.Nm uda 35driver 36is a typical block-device disk driver; see 37.Xr physio 4 38for a description of block 39.Tn I/O . 40The script 41.Xr MAKEDEV 8 42should be used to create the 43.Nm uda 44special files; should a special 45file need to be created by hand, consult 46.Xr mknod 8 . 47.Pp 48The 49.Dv MSCP_PARANOIA 50option enables runtime checking on all transfer completion responses 51from the controller. This increases disk 52.Tn I/O 53overhead and may 54be undesirable on slow machines, but is otherwise recommended. 55.Pp 56The first sector of each disk contains both a first-stage bootstrap program 57and a disk label containing geometry information and partition layouts (see 58.Xr disklabel 5 ) . 59This sector is normally write-protected, and disk-to-disk copies should 60avoid copying this sector. 61The label may be updated with 62.Xr disklabel 8 , 63which can also be used to write-enable and write-disable the sector. 64The next 15 sectors contain a second-stage bootstrap program. 65.Sh DISK SUPPORT 66During autoconfiguration, 67as well as when a drive is opened after all partitions are closed, 68the first sector of the drive is examined for a disk label. 69If a label is found, the geometry of the drive and the partition tables 70are taken from it. 71If no label is found, 72the driver configures the type of each drive when it is first 73encountered. A default partition table in the driver is used for each type 74of disk when a pack is not labelled. The origin and size 75(in sectors) of the default pseudo-disks on each 76drive are shown below. Not all partitions begin on cylinder 77boundaries, as on other drives, because previous drivers used one 78partition table for all drive types. Variants of the partition tables 79are common; check the driver and the file 80.Pa /etc/disktab 81.Pq Xr disktab 5 82for other possibilities. 83.Pp 84Special file names begin with 85.Sq Li ra 86and 87.Sq Li rra 88for the block and character files respectively. The second 89component of the name, a drive unit number in the range of zero to 90seven, is represented by a 91.Sq Li ? 92in the disk layouts below. The last component of the name is the 93file system partition 94designated 95by a letter from 96.Sq Li a 97to 98.Sq Li h 99and which corresponds to a minor device number set: zero to seven, 100eight to 15, 16 to 23 and so forth for drive zero, drive two and drive 101three respectively, (see 102.Xr physio 4) . 103The location and size (in sectors) of the partitions: 104.Bl -column header diskx undefined length 105.Tn RA60 No partitions 106.Sy disk start length 107 ra?a 0 15884 108 ra?b 15884 33440 109 ra?c 0 400176 110 ra?d 49324 82080 same as 4.2BSD ra?g 111 ra?e 131404 268772 same as 4.2BSD ra?h 112 ra?f 49324 350852 113 ra?g 242606 157570 114 ra?h 49324 193282 115 116.Tn RA70 No partitions 117.Sy disk start length 118 ra?a 0 15884 119 ra?b 15972 33440 120 ra?c 0 547041 121 ra?d 34122 15884 122 ra?e 357192 55936 123 ra?f 413457 133584 124 ra?g 341220 205821 125 ra?h 49731 29136 126 127.Tn RA80 No partitions 128.Sy disk start length 129 ra?a 0 15884 130 ra?b 15884 33440 131 ra?c 0 242606 132 ra?e 49324 193282 same as old Berkeley ra?g 133 ra?f 49324 82080 same as 4.2BSD ra?g 134 ra?g 49910 192696 135 ra?h 131404 111202 same as 4.2BSD 136 137.Tn RA81 No partitions 138.Sy disk start length 139 ra?a 0 15884 140 ra?b 16422 66880 141 ra?c 0 891072 142 ra?d 375564 15884 143 ra?e 391986 307200 144 ra?f 699720 191352 145 ra?g 375564 515508 146 ra?h 83538 291346 147 148.Tn RA81 No partitions with 4.2BSD-compatible partitions 149.Sy disk start length 150 ra?a 0 15884 151 ra?b 16422 66880 152 ra?c 0 891072 153 ra?d 49324 82080 same as 4.2BSD ra?g 154 ra?e 131404 759668 same as 4.2BSD ra?h 155 ra?f 412490 478582 same as 4.2BSD ra?f 156 ra?g 375564 515508 157 ra?h 83538 291346 158 159.Tn RA82 No partitions 160.Sy disk start length 161 ra?a 0 15884 162 ra?b 16245 66880 163 ra?c 0 1135554 164 ra?d 375345 15884 165 ra?e 391590 307200 166 ra?f 669390 466164 167 ra?g 375345 760209 168 ra?h 83790 291346 169.El 170.Pp 171The ra?a partition is normally used for the root file system, the ra?b 172partition as a paging area, and the ra?c partition for pack-pack 173copying (it maps the entire disk). 174.Sh FILES 175.Bl -tag -width /dev/rra[0-9][a-f] -compact 176.It Pa /dev/ra[0-9][a-f] 177.It Pa /dev/rra[0-9][a-f] 178.El 179.Sh DIAGNOSTICS 180.Bl -diag 181.It "panic: udaslave" 182No command packets were available while the driver was looking 183for disk drives. The controller is not extending enough credits 184to use the drives. 185.Pp 186.It "uda%d: no response to Get Unit Status request" 187A disk drive was found, but did not respond to a status request. 188This is either a hardware problem or someone pulling unit number 189plugs very fast. 190.Pp 191.It "uda%d: unit %d off line" 192While searching for drives, the controller found one that 193seems to be manually disabled. It is ignored. 194.Pp 195.It "uda%d: unable to get unit status" 196Something went wrong while trying to determine the status of 197a disk drive. This is followed by an error detail. 198.Pp 199.It uda%d: unit %d, next %d 200This probably never happens, but I wanted to know if it did. I 201have no idea what one should do about it. 202.Pp 203.It "uda%d: cannot handle unit number %d (max is %d)" 204The controller found a drive whose unit number is too large. 205Valid unit numbers are those in the range [0..7]. 206.Pp 207.It "ra%d: don't have a partition table for %s; using (s,t,c)=(%d,%d,%d)" 208The controller found a drive whose media identifier (e.g. `RA 25') 209does not have a default partition table. A temporary partition 210table containing only an `a' partition has been created covering 211the entire disk, which has the indicated numbers of sectors per 212track (s), tracks per cylinder (t), and total cylinders (c). 213Give the pack a label with the 214.Xr disklabel 215utility. 216.Pp 217.It "uda%d: uballoc map failed" 218Unibus resource map allocation failed during initialisation. This 219can only happen if you have 496 devices on a Unibus. 220.Pp 221.It uda%d: timeout during init 222The controller did not initialise within ten seconds. A hardware 223problem, but it sometimes goes away if you try again. 224.Pp 225.It uda%d: init failed, sa=%b 226The controller refused to initalise. 227.Pp 228.It uda%d: controller hung 229The controller never finished initialisation. Retrying may sometimes 230fix it. 231.Pp 232.It ra%d: drive will not come on line 233The drive will not come on line, probably because it is spun down. 234This should be preceded by a message giving details as to why the 235drive stayed off line. 236.Pp 237.It uda%d: still hung 238When the controller hangs, the driver occasionally tries to reinitialise 239it. This means it just tried, without success. 240.Pp 241.It panic: udastart: bp==NULL 242A bug in the driver has put an empty drive queue on a controller queue. 243.Pp 244.It uda%d: command ring too small 245If you increase 246.Dv NCMDL2 , 247you may see a performance improvement. 248(See 249.Pa /sys/vaxuba/uda.c . ) 250.Pp 251.It panic: udastart 252A drive was found marked for status or on-line functions while performing 253status or on-line functions. This indicates a bug in the driver. 254.Pp 255.It "uda%d: controller error, sa=0%o (%s)" 256The controller reported an error. The error code is printed in 257octal, along with a short description if the code is known (see the 258.%T UDA50 Maintenance Guide , 259.Tn DEC 260part number 261.Tn AA-M185B-TC , 262pp. 18-22). 263If this occurs during normal 264operation, the driver will reset it and retry pending 265.Tn I/O . 266If 267it occurs during configuration, the controller may be ignored. 268.Pp 269.It uda%d: stray intr 270The controller interrupted when it should have stayed quiet. The 271interrupt has been ignored. 272.Pp 273.It "uda%d: init step %d failed, sa=%b" 274The controller reported an error during the named initialisation step. 275The driver will retry initialisation later. 276.Pp 277.It uda%d: version %d model %d 278An informational message giving the revision level of the controller. 279.Pp 280.It uda%d: DMA burst size set to %d 281An informational message showing the 282.Tn DMA 283burst size, in words. 284.Pp 285.It panic: udaintr 286Indicates a bug in the generic 287.Tn MSCP 288code. 289.Pp 290.It uda%d: driver bug, state %d 291The driver has a bogus value for the controller state. Something 292is quite wrong. This is immediately followed by a `panic: udastate'. 293.Pp 294.It uda%d: purge bdp %d 295A benign message tracing BDP purges. I have been trying to figure 296out what BDP purges are for. You might want to comment out this 297call to log() in /sys/vaxuba/uda.c. 298.Pp 299.It uda%d: SETCTLRC failed: `detail' 300The Set Controller Characteristics command (the last part of the 301controller initialisation sequence) failed. The 302.Em detail 303message tells why. 304.Pp 305.It "uda%d: attempt to bring ra%d on line failed: `detail'" 306The drive could not be brought on line. The 307.Em detail 308message tells why. 309.Pp 310.It uda%d: ra%d: unknown type %d 311The type index of the named drive is not known to the driver, so the 312drive will be ignored. 313.Pp 314.It "ra%d: changed types! was %d now %d" 315A drive somehow changed from one kind to another, e.g., from an 316.Tn RA80 317to an 318.Tn RA60 . 319The numbers printed are the encoded media identifiers (see 320.Ao Pa vax/mscp.h Ac 321for the encoding). 322The driver believes the new type. 323.Pp 324.It "ra%d: uda%d, unit %d, size = %d sectors" 325The named drive is on the indicated controller as the given unit, 326and has that many sectors of user-file area. This is printed 327during configuration. 328.Pp 329.It "uda%d: attempt to get status for ra%d failed: `detail'" 330A status request failed. The 331.Em detail 332message should tell why. 333.Pp 334.It ra%d: bad block report: %d 335The drive has reported the given block as bad. If there are multiple 336bad blocks, the drive will report only the first; in this case this 337message will be followed by `+ others'. Get 338.Tn DEC 339to forward the 340block with 341.Tn EVRLK . 342.Pp 343.It ra%d: serious exception reported 344I have no idea what this really means. 345.Pp 346.It panic: udareplace 347The controller reported completion of a 348.Tn REPLACE 349operation. The 350driver never issues any 351.Tn REPLACE Ns s , 352so something is wrong. 353.Pp 354.It panic: udabb 355The controller reported completion of bad block related 356.Tn I/O . 357The 358driver never issues any such, so something is wrong. 359.Pp 360.It uda%d: lost interrupt 361The controller has gone out to lunch, and is being reset to try to bring 362it back. 363.Pp 364.It panic: mscp_go: AEB_MAX_BP too small 365You defined 366.Dv AVOID_EMULEX_BUG 367and increased 368.Dv NCMDL2 369and Emulex has 370new firmware. Raise 371.Dv AEB_MAX_BP 372or turn off 373.Dv AVOID_EMULEX_BUG . 374.Pp 375.It "uda%d: unit %d: unknown message type 0x%x ignored" 376The controller responded with a mysterious message type. See 377.Pa /sys/vax/mscp.h 378for a list of known message types. This is probably 379a controller hardware problem. 380.Pp 381.It "uda%d: unit %d out of range" 382The disk drive unit number (the unit plug) is higher than the 383maximum number the driver allows (currently 7). 384.Pp 385.It "uda%d: unit %d not configured, message ignored" 386The named disk drive has announced its presence to the controller, 387but was not, or cannot now be, configured into the running system. 388.Em Message 389is one of `available attention' (an `I am here' message) or 390`stray response op 0x%x status 0x%x' (anything else). 391.Pp 392.It ra%d: bad lbn (%d)? 393The drive has reported an invalid command error, probably due to an 394invalid block number. If the lbn value is very much greater than the 395size reported by the drive, this is the problem. It is probably due to 396an improperly configured partition table. Other invalid commands 397indicate a bug in the driver, or hardware trouble. 398.Pp 399.It ra%d: duplicate ONLINE ignored 400The drive has come on-line while already on-line. This condition 401can probably be ignored (and has been). 402.Pp 403.It ra%d: io done, but no buffer? 404Hardware trouble, or a bug; the drive has finished an 405.Tn I/O 406request, 407but the response has an invalid (zero) command reference number. 408.Pp 409.It "Emulex SC41/MS screwup: uda%d, got %d correct, then changed 0x%x to 0x%x" 410You turned on 411.Dv AVOID_EMULEX_BUG , 412and the driver successfully 413avoided the bug. The number of correctly-handled requests is 414reported, along with the expected and actual values relating to 415the bug being avoided. 416.Pp 417.It panic: unrecoverable Emulex screwup 418You turned on 419.Dv AVOID_EMULEX_BUG , 420but Emulex was too clever and 421avoided the avoidance. Try turning on 422.Dv MSCP_PARANOIA 423instead. 424.Pp 425.It uda%d: bad response packet ignored 426You turned on 427.Dv MSCP_PARANOIA , 428and the driver caught the controller in 429a lie. The lie has been ignored, and the controller will soon be 430reset (after a `lost' interrupt). This is followed by a hex dump of 431the offending packet. 432.Pp 433.It ra%d: bogus REPLACE end 434The drive has reported finishing a bad sector replacement, but the 435driver never issues bad sector replacement commands. The report 436is ignored. This is likely a hardware problem. 437.Pp 438.It "ra%d: unknown opcode 0x%x status 0x%x ignored" 439The drive has reported something that the driver cannot understand. 440Perhaps 441.Tn DEC 442has been inventive, or perhaps your hardware is ill. 443This is followed by a hex dump of the offending packet. 444.Pp 445.It "ra%d%c: hard error %sing fsbn %d [of %d-%d] (ra%d bn %d cn %d tn %d sn %d)." 446An unrecoverable error occurred during transfer of the specified 447filesystem block number(s), 448which are logical block numbers on the indicated partition. 449If the transfer involved multiple blocks, the block range is printed as well. 450The parenthesized fields list the actual disk sector number 451relative to the beginning of the drive, 452as well as the cylinder, track and sector number of the block. 453.Pp 454.It uda%d: %s error datagram 455The controller has reported some kind of error, either `hard' 456(unrecoverable) or `soft' (recoverable). If the controller is going on 457(attempting to fix the problem), this message includes the remark 458`(continuing)'. Emulex controllers wrongly claim that all soft errors 459are hard errors. This message may be followed by 460one of the following 5 messages, depending on its type, and will always 461be followed by a failure detail message (also listed below). 462.Bd -filled -offset indent 463.It memory addr 0x%x 464A host memory access error; this is the address that could not be 465read. 466.Pp 467.It "unit %d: level %d retry %d, %s %d" 468A typical disk error; the retry count and error recovery levels are 469printed, along with the block type (`lbn', or logical block; or `rbn', 470or replacement block) and number. If the string is something else, 471.Tn DEC 472has been clever, or your hardware has gone to Australia for vacation 473(unless you live there; then it might be in New Zealand, or Brazil). 474.Pp 475.It unit %d: %s %d 476Also a disk error, but an `SDI' error, whatever that is. (I doubt 477it has anything to do with Ronald Reagan.) This lists the block 478type (`lbn' or `rbn') and number. This is followed by a second 479message indicating a microprocessor error code and a front panel 480code. These latter codes are drive-specific, and are intended to 481be used by field service as an aid in locating failing hardware. 482The codes for RA81s can be found in the 483.%T RA81 Maintenance Guide , 484DEC order number AA-M879A-TC, in appendices E and F. 485.Pp 486.It "unit %d: small disk error, cyl %d" 487Yet another kind of disk error, but for small disks. (`That's what 488it says, guv'nor. Dunnask me what it means.') 489.Pp 490.It "unit %d: unknown error, format 0x%x" 491A mysterious error: the given format code is not known. 492.Ed 493.Pp 494The detail messages are as follows: 495.Bd -filled -offset indent 496.It success (%s) (code 0, subcode %d) 497Everything worked, but the controller thought it would let you know 498that something went wrong. No matter what subcode, this can probably 499be ignored. 500.Pp 501.It "invalid command (%s) (code 1, subcode %d)" 502This probably cannot occur unless the hardware is out; %s should be 503`invalid msg length', meaning some command was too short or too long. 504.Pp 505.It "command aborted (unknown subcode) (code 2, subcode %d)" 506This should never occur, as the driver never aborts commands. 507.Pp 508.It "unit offline (%s) (code 3, subcode %d)" 509The drive is offline, either because it is not around (`unknown 510drive'), stopped (`not mounted'), out of order (`inoperative'), has the 511same unit number as some other drive (`duplicate'), or has been 512disabled for diagnostics (`in diagnosis'). 513.Pp 514.It "unit available (unknown subcode) (code 4, subcode %d)" 515The controller has decided to report a perfectly normal event as 516an error. (Why?) 517.Pp 518.It "media format error (%s) (code 5, subcode %d)" 519The drive cannot be used without reformatting. The Format Control 520Table cannot be read (`fct unread - edc'), there is a bad sector 521header (`invalid sector header'), the drive is not set for 512-byte 522sectors (`not 512 sectors'), the drive is not formatted (`not formatted'), 523or the 524.Tn FCT 525has an uncorrectable 526.Tn ECC 527error (`fct ecc'). 528.Pp 529.It "write protected (%s) (code 6, subcode %d)" 530The drive is write protected, either by the front panel switch 531(`hardware') or via the driver (`software'). The driver never 532sets software write protect. 533.Pp 534.It "compare error (unknown subcode) (code 7, subcode %d)" 535A compare operation showed some sort of difference. The driver 536never uses compare operations. 537.Pp 538.It "data error (%s) (code 7, subcode %d)" 539Something went wrong reading or writing a data sector. A `forced 540error' is a software-asserted error used to mark a sector that contains 541suspect data. Rewriting the sector will clear the forced error. This 542is normally set only during bad block replacment, and the driver does 543no bad block replacement, so these should not occur. A `header 544compare' error probably means the block is shot. A `sync timeout' 545presumably has something to do with sector synchronisation. 546An `uncorrectable ecc' error is an ordinary data error that cannot 547be fixed via 548.Tn ECC 549logic. A `%d symbol ecc' error is a data error 550that can be (and presumably has been) corrected by the 551.Tn ECC 552logic. 553It might indicate a sector that is imperfect but usable, or that 554is starting to go bad. If any of these errors recur, the sector 555may need to be replaced. 556.Pp 557.It "host buffer access error (%s) (code %d, subcode %d)" 558Something went wrong while trying to copy data to or from the host 559(Vax). The subcode is one of `odd xfer addr', `odd xfer count', 560`non-exist. memory', or `memory parity'. The first two could be a 561software glitch; the last two indicate hardware problems. 562.It controller error (%s) (code %d, subcode %d) 563The controller has detected a hardware error in itself. A 564`serdes overrun' is a serialiser / deserialiser overrun; `edc' 565probably stands for `error detection code'; and `inconsistent 566internal data struct' is obvious. 567.Pp 568.It "drive error (%s) (code %d, subcode %d)" 569Either the controller or the drive has detected a hardware error 570in the drive. I am not sure what an `sdi command timeout' is, but 571these seem to occur benignly on occasion. A `ctlr detected protocol' 572error means that the controller and drive do not agree on a protocol; 573this could be a cabling problem, or a version mismatch. A `positioner' 574error means the drive seek hardware is ailing; `lost rd/wr ready' 575means the drive read/write logic is sick; and `drive clock dropout' 576means that the drive clock logic is bad, or the media is hopelessly 577scrambled. I have no idea what `lost recvr ready' means. A `drive 578detected error' is a catch-all for drive hardware trouble; `ctlr 579detected pulse or parity' errors are often caused by cabling problems. 580.Ed 581.El 582.Sh SEE ALSO 583.Xr disklabel 5 , 584.Xr disklabel 8 585.Sh HISTORY 586The 587.Nm 588driver appeared in 589.Bx 4.2 . 590