xref: /original-bsd/share/man/man4/man4.vax/uda.4 (revision c3e32dec)
1.\" Copyright (c) 1980, 1987, 1991, 1993
2.\"	The Regents of the University of California.  All rights reserved.
3.\"
4.\" %sccs.include.redist.man%
5.\"
6.\"     @(#)uda.4	8.1 (Berkeley) 06/05/93
7.\"
8.Dd
9.Dt UDA 4 vax
10.Os BSD 4
11.Sh NAME
12.Nm uda
13.Nd
14.Tn UDA50
15disk controller interface
16.Sh SYNOPSIS
17.Cd "controller uda0 at uba0 csr 0172150 vector udaintr"
18.Cd "disk ra0 at uda0 drive 0"
19.Cd "options MSCP_PARANOIA"
20.Sh DESCRIPTION
21This is a driver for the
22.Tn DEC UDA50
23disk controller and other
24compatible controllers.  The
25.Tn UDA50
26communicates with the host through
27a packet protocol known as the Mass Storage Control Protocol
28.Pq Tn MSCP .
29Consult the file
30.Aq Pa vax/mscp.h
31for a detailed description of this protocol.
32.Pp
33The
34.Nm uda
35driver
36is a typical block-device disk driver; see
37.Xr physio 4
38for a description of block
39.Tn I/O .
40The script
41.Xr MAKEDEV 8
42should be used to create the
43.Nm uda
44special files; should a special
45file need to be created by hand, consult
46.Xr mknod 8 .
47.Pp
48The
49.Dv MSCP_PARANOIA
50option enables runtime checking on all transfer completion responses
51from the controller.  This increases disk
52.Tn I/O
53overhead and may
54be undesirable on slow machines, but is otherwise recommended.
55.Pp
56The first sector of each disk contains both a first-stage bootstrap program
57and a disk label containing geometry information and partition layouts (see
58.Xr disklabel 5 ) .
59This sector is normally write-protected, and disk-to-disk copies should
60avoid copying this sector.
61The label may be updated with
62.Xr disklabel 8 ,
63which can also be used to write-enable and write-disable the sector.
64The next 15 sectors contain a second-stage bootstrap program.
65.Sh DISK SUPPORT
66During autoconfiguration,
67as well as when a drive is opened after all partitions are closed,
68the first sector of the drive is examined for a disk label.
69If a label is found, the geometry of the drive and the partition tables
70are taken from it.
71If no label is found,
72the driver configures the type of each drive when it is first
73encountered.  A default partition table in the driver is used for each type
74of disk when a pack is not labelled.  The origin and size
75(in sectors) of the default pseudo-disks on each
76drive are shown below.  Not all partitions begin on cylinder
77boundaries, as on other drives, because previous drivers used one
78partition table for all drive types.  Variants of the partition tables
79are common; check the driver and the file
80.Pa /etc/disktab
81.Pq Xr disktab 5
82for other possibilities.
83.Pp
84Special file names begin with
85.Sq Li ra
86and
87.Sq Li rra
88for the block and character files respectively. The second
89component of the name, a drive unit number in the range of zero to
90seven, is represented by a
91.Sq Li ?
92in the disk layouts below. The last component of the name is the
93file system partition
94designated
95by a letter from
96.Sq Li a
97to
98.Sq Li h
99and which corresponds to a minor device number set: zero to seven,
100eight to 15, 16 to 23 and so forth for drive zero, drive two and drive
101three respectively, (see
102.Xr physio 4) .
103The location and size (in sectors) of the partitions:
104.Bl -column header diskx undefined length
105.Tn RA60 No partitions
106.Sy	disk	start	length
107	ra?a	0	15884
108	ra?b	15884	33440
109	ra?c	0	400176
110	ra?d	49324	82080	same as 4.2BSD ra?g
111	ra?e	131404	268772	same as 4.2BSD ra?h
112	ra?f	49324	350852
113	ra?g	242606	157570
114	ra?h	49324	193282
115
116.Tn RA70 No partitions
117.Sy	disk	start	length
118	ra?a	0	15884
119	ra?b	15972	33440
120	ra?c	0	547041
121	ra?d	34122	15884
122	ra?e	357192	55936
123	ra?f	413457	133584
124	ra?g	341220	205821
125	ra?h	49731	29136
126
127.Tn RA80 No partitions
128.Sy	disk	start	length
129	ra?a	0	15884
130	ra?b	15884	33440
131	ra?c	0	242606
132	ra?e	49324	193282	same as old Berkeley ra?g
133	ra?f	49324	82080	same as 4.2BSD ra?g
134	ra?g	49910	192696
135	ra?h	131404	111202	same as 4.2BSD
136
137.Tn RA81 No partitions
138.Sy	disk	start	length
139	ra?a	0	15884
140	ra?b	16422	66880
141	ra?c	0	891072
142	ra?d	375564	15884
143	ra?e	391986	307200
144	ra?f	699720	191352
145	ra?g	375564	515508
146	ra?h	83538	291346
147
148.Tn RA81 No partitions with 4.2BSD-compatible partitions
149.Sy	disk	start	length
150	ra?a	0	15884
151	ra?b	16422	66880
152	ra?c	0	891072
153	ra?d	49324	82080	same as 4.2BSD ra?g
154	ra?e	131404	759668	same as 4.2BSD ra?h
155	ra?f	412490	478582	same as 4.2BSD ra?f
156	ra?g	375564	515508
157	ra?h	83538	291346
158
159.Tn RA82 No partitions
160.Sy	disk	start	length
161	ra?a	0	15884
162	ra?b	16245	66880
163	ra?c	0	1135554
164	ra?d	375345	15884
165	ra?e	391590	307200
166	ra?f	669390	466164
167	ra?g	375345	760209
168	ra?h	83790	291346
169.El
170.Pp
171The ra?a partition is normally used for the root file system, the ra?b
172partition as a paging area, and the ra?c partition for pack-pack
173copying (it maps the entire disk).
174.Sh FILES
175.Bl -tag -width /dev/rra[0-9][a-f] -compact
176.It Pa /dev/ra[0-9][a-f]
177.It Pa /dev/rra[0-9][a-f]
178.El
179.Sh DIAGNOSTICS
180.Bl -diag
181.It "panic: udaslave"
182No command packets were available while the driver was looking
183for disk drives.  The controller is not extending enough credits
184to use the drives.
185.Pp
186.It "uda%d: no response to Get Unit Status request"
187A disk drive was found, but did not respond to a status request.
188This is either a hardware problem or someone pulling unit number
189plugs very fast.
190.Pp
191.It "uda%d: unit %d off line"
192While searching for drives, the controller found one that
193seems to be manually disabled.  It is ignored.
194.Pp
195.It "uda%d: unable to get unit status"
196Something went wrong while trying to determine the status of
197a disk drive.  This is followed by an error detail.
198.Pp
199.It uda%d: unit %d, next %d
200This probably never happens, but I wanted to know if it did.  I
201have no idea what one should do about it.
202.Pp
203.It "uda%d: cannot handle unit number %d (max is %d)"
204The controller found a drive whose unit number is too large.
205Valid unit numbers are those in the range [0..7].
206.Pp
207.It "ra%d: don't have a partition table for %s; using (s,t,c)=(%d,%d,%d)"
208The controller found a drive whose media identifier (e.g. `RA 25')
209does not have a default partition table.  A temporary partition
210table containing only an `a' partition has been created covering
211the entire disk, which has the indicated numbers of sectors per
212track (s), tracks per cylinder (t), and total cylinders (c).
213Give the pack a label with the
214.Xr disklabel
215utility.
216.Pp
217.It "uda%d: uballoc map failed"
218Unibus resource map allocation failed during initialisation.  This
219can only happen if you have 496 devices on a Unibus.
220.Pp
221.It uda%d: timeout during init
222The controller did not initialise within ten seconds.  A hardware
223problem, but it sometimes goes away if you try again.
224.Pp
225.It uda%d: init failed, sa=%b
226The controller refused to initalise.
227.Pp
228.It uda%d: controller hung
229The controller never finished initialisation.  Retrying may sometimes
230fix it.
231.Pp
232.It ra%d: drive will not come on line
233The drive will not come on line, probably because it is spun down.
234This should be preceded by a message giving details as to why the
235drive stayed off line.
236.Pp
237.It uda%d: still hung
238When the controller hangs, the driver occasionally tries to reinitialise
239it.  This means it just tried, without success.
240.Pp
241.It panic: udastart: bp==NULL
242A bug in the driver has put an empty drive queue on a controller queue.
243.Pp
244.It uda%d: command ring too small
245If you increase
246.Dv NCMDL2 ,
247you may see a performance improvement.
248(See
249.Pa /sys/vaxuba/uda.c . )
250.Pp
251.It panic: udastart
252A drive was found marked for status or on-line functions while performing
253status or on-line functions.  This indicates a bug in the driver.
254.Pp
255.It "uda%d: controller error, sa=0%o (%s)"
256The controller reported an error.  The error code is printed in
257octal, along with a short description if the code is known (see the
258.%T UDA50 Maintenance Guide ,
259.Tn DEC
260part number
261.Tn AA-M185B-TC ,
262pp. 18-22).
263If this occurs during normal
264operation, the driver will reset it and retry pending
265.Tn I/O .
266If
267it occurs during configuration, the controller may be ignored.
268.Pp
269.It uda%d: stray intr
270The controller interrupted when it should have stayed quiet.  The
271interrupt has been ignored.
272.Pp
273.It "uda%d: init step %d failed, sa=%b"
274The controller reported an error during the named initialisation step.
275The driver will retry initialisation later.
276.Pp
277.It uda%d: version %d model %d
278An informational message giving the revision level of the controller.
279.Pp
280.It uda%d: DMA burst size set to %d
281An informational message showing the
282.Tn DMA
283burst size, in words.
284.Pp
285.It panic: udaintr
286Indicates a bug in the generic
287.Tn MSCP
288code.
289.Pp
290.It uda%d: driver bug, state %d
291The driver has a bogus value for the controller state.  Something
292is quite wrong.  This is immediately followed by a `panic: udastate'.
293.Pp
294.It uda%d: purge bdp %d
295A benign message tracing BDP purges.  I have been trying to figure
296out what BDP purges are for.  You might want to comment out this
297call to log() in /sys/vaxuba/uda.c.
298.Pp
299.It uda%d: SETCTLRC failed:  `detail'
300The Set Controller Characteristics command (the last part of the
301controller initialisation sequence) failed.  The
302.Em detail
303message tells why.
304.Pp
305.It "uda%d: attempt to bring ra%d on line failed:  `detail'"
306The drive could not be brought on line.  The
307.Em detail
308message tells why.
309.Pp
310.It uda%d: ra%d: unknown type %d
311The type index of the named drive is not known to the driver, so the
312drive will be ignored.
313.Pp
314.It "ra%d: changed types! was %d now %d"
315A drive somehow changed from one kind to another, e.g., from an
316.Tn RA80
317to an
318.Tn RA60 .
319The numbers printed are the encoded media identifiers (see
320.Ao Pa vax/mscp.h Ac
321for the encoding).
322The driver believes the new type.
323.Pp
324.It "ra%d: uda%d, unit %d, size = %d sectors"
325The named drive is on the indicated controller as the given unit,
326and has that many sectors of user-file area.  This is printed
327during configuration.
328.Pp
329.It "uda%d: attempt to get status for ra%d failed:  `detail'"
330A status request failed.  The
331.Em detail
332message should tell why.
333.Pp
334.It ra%d: bad block report: %d
335The drive has reported the given block as bad.  If there are multiple
336bad blocks, the drive will report only the first; in this case this
337message will be followed by `+ others'.  Get
338.Tn DEC
339to forward the
340block with
341.Tn EVRLK .
342.Pp
343.It ra%d: serious exception reported
344I have no idea what this really means.
345.Pp
346.It panic: udareplace
347The controller reported completion of a
348.Tn REPLACE
349operation.  The
350driver never issues any
351.Tn REPLACE Ns s ,
352so something is wrong.
353.Pp
354.It panic: udabb
355The controller reported completion of bad block related
356.Tn I/O .
357The
358driver never issues any such, so something is wrong.
359.Pp
360.It uda%d: lost interrupt
361The controller has gone out to lunch, and is being reset to try to bring
362it back.
363.Pp
364.It panic: mscp_go: AEB_MAX_BP too small
365You defined
366.Dv AVOID_EMULEX_BUG
367and increased
368.Dv NCMDL2
369and Emulex has
370new firmware.  Raise
371.Dv AEB_MAX_BP
372or turn off
373.Dv AVOID_EMULEX_BUG .
374.Pp
375.It "uda%d: unit %d: unknown message type 0x%x ignored"
376The controller responded with a mysterious message type. See
377.Pa /sys/vax/mscp.h
378for a list of known message types.  This is probably
379a controller hardware problem.
380.Pp
381.It "uda%d: unit %d out of range"
382The disk drive unit number (the unit plug) is higher than the
383maximum number the driver allows (currently 7).
384.Pp
385.It "uda%d: unit %d not configured, message ignored"
386The named disk drive has announced its presence to the controller,
387but was not, or cannot now be, configured into the running system.
388.Em Message
389is one of `available attention' (an `I am here' message) or
390`stray response op 0x%x status 0x%x' (anything else).
391.Pp
392.It ra%d: bad lbn (%d)?
393The drive has reported an invalid command error, probably due to an
394invalid block number.  If the lbn value is very much greater than the
395size reported by the drive, this is the problem.  It is probably due to
396an improperly configured partition table.  Other invalid commands
397indicate a bug in the driver, or hardware trouble.
398.Pp
399.It ra%d: duplicate ONLINE ignored
400The drive has come on-line while already on-line.  This condition
401can probably be ignored (and has been).
402.Pp
403.It ra%d: io done, but no buffer?
404Hardware trouble, or a bug; the drive has finished an
405.Tn I/O
406request,
407but the response has an invalid (zero) command reference number.
408.Pp
409.It "Emulex SC41/MS screwup: uda%d, got %d correct, then changed 0x%x to 0x%x"
410You turned on
411.Dv AVOID_EMULEX_BUG ,
412and the driver successfully
413avoided the bug.  The number of correctly-handled requests is
414reported, along with the expected and actual values relating to
415the bug being avoided.
416.Pp
417.It panic: unrecoverable Emulex screwup
418You turned on
419.Dv AVOID_EMULEX_BUG ,
420but Emulex was too clever and
421avoided the avoidance.  Try turning on
422.Dv MSCP_PARANOIA
423instead.
424.Pp
425.It uda%d: bad response packet ignored
426You turned on
427.Dv MSCP_PARANOIA ,
428and the driver caught the controller in
429a lie.  The lie has been ignored, and the controller will soon be
430reset (after a `lost' interrupt).  This is followed by a hex dump of
431the offending packet.
432.Pp
433.It ra%d: bogus REPLACE end
434The drive has reported finishing a bad sector replacement, but the
435driver never issues bad sector replacement commands.  The report
436is ignored.  This is likely a hardware problem.
437.Pp
438.It "ra%d: unknown opcode 0x%x status 0x%x ignored"
439The drive has reported something that the driver cannot understand.
440Perhaps
441.Tn DEC
442has been inventive, or perhaps your hardware is ill.
443This is followed by a hex dump of the offending packet.
444.Pp
445.It "ra%d%c: hard error %sing fsbn %d [of %d-%d] (ra%d bn %d cn %d tn %d sn %d)."
446An unrecoverable error occurred during transfer of the specified
447filesystem block number(s),
448which are logical block numbers on the indicated partition.
449If the transfer involved multiple blocks, the block range is printed as well.
450The parenthesized fields list the actual disk sector number
451relative to the beginning of the drive,
452as well as the cylinder, track and sector number of the block.
453.Pp
454.It uda%d: %s error datagram
455The controller has reported some kind of error, either `hard'
456(unrecoverable) or `soft' (recoverable).  If the controller is going on
457(attempting to fix the problem), this message includes the remark
458`(continuing)'.  Emulex controllers wrongly claim that all soft errors
459are hard errors.  This message may be followed by
460one of the following 5 messages, depending on its type, and will always
461be followed by a failure detail message (also listed below).
462.Bd -filled -offset indent
463.It memory addr 0x%x
464A host memory access error; this is the address that could not be
465read.
466.Pp
467.It "unit %d: level %d retry %d, %s %d"
468A typical disk error; the retry count and error recovery levels are
469printed, along with the block type (`lbn', or logical block; or `rbn',
470or replacement block) and number.  If the string is something else,
471.Tn DEC
472has been clever, or your hardware has gone to Australia for vacation
473(unless you live there; then it might be in New Zealand, or Brazil).
474.Pp
475.It unit %d: %s %d
476Also a disk error, but an `SDI' error, whatever that is.  (I doubt
477it has anything to do with Ronald Reagan.)  This lists the block
478type (`lbn' or `rbn') and number.  This is followed by a second
479message indicating a microprocessor error code and a front panel
480code.  These latter codes are drive-specific, and are intended to
481be used by field service as an aid in locating failing hardware.
482The codes for RA81s can be found in the
483.%T RA81 Maintenance Guide ,
484DEC order number AA-M879A-TC, in appendices E and F.
485.Pp
486.It "unit %d: small disk error, cyl %d"
487Yet another kind of disk error, but for small disks.  (`That's what
488it says, guv'nor.  Dunnask me what it means.')
489.Pp
490.It "unit %d: unknown error, format 0x%x"
491A mysterious error: the given format code is not known.
492.Ed
493.Pp
494The detail messages are as follows:
495.Bd -filled -offset indent
496.It success (%s) (code 0, subcode %d)
497Everything worked, but the controller thought it would let you know
498that something went wrong.  No matter what subcode, this can probably
499be ignored.
500.Pp
501.It "invalid command (%s) (code 1, subcode %d)"
502This probably cannot occur unless the hardware is out; %s should be
503`invalid msg length', meaning some command was too short or too long.
504.Pp
505.It "command aborted (unknown subcode) (code 2, subcode %d)"
506This should never occur, as the driver never aborts commands.
507.Pp
508.It "unit offline (%s) (code 3, subcode %d)"
509The drive is offline, either because it is not around (`unknown
510drive'), stopped (`not mounted'), out of order (`inoperative'), has the
511same unit number as some other drive (`duplicate'), or has been
512disabled for diagnostics (`in diagnosis').
513.Pp
514.It "unit available (unknown subcode) (code 4, subcode %d)"
515The controller has decided to report a perfectly normal event as
516an error.  (Why?)
517.Pp
518.It "media format error (%s) (code 5, subcode %d)"
519The drive cannot be used without reformatting.  The Format Control
520Table cannot be read (`fct unread - edc'), there is a bad sector
521header (`invalid sector header'), the drive is not set for 512-byte
522sectors (`not 512 sectors'), the drive is not formatted (`not formatted'),
523or the
524.Tn FCT
525has an uncorrectable
526.Tn ECC
527error (`fct ecc').
528.Pp
529.It "write protected (%s) (code 6, subcode %d)"
530The drive is write protected, either by the front panel switch
531(`hardware') or via the driver (`software').  The driver never
532sets software write protect.
533.Pp
534.It "compare error (unknown subcode) (code 7, subcode %d)"
535A compare operation showed some sort of difference.  The driver
536never uses compare operations.
537.Pp
538.It "data error (%s) (code 7, subcode %d)"
539Something went wrong reading or writing a data sector.  A `forced
540error' is a software-asserted error used to mark a sector that contains
541suspect data.  Rewriting the sector will clear the forced error.  This
542is normally set only during bad block replacment, and the driver does
543no bad block replacement, so these should not occur.  A `header
544compare' error probably means the block is shot.  A `sync timeout'
545presumably has something to do with sector synchronisation.
546An `uncorrectable ecc' error is an ordinary data error that cannot
547be fixed via
548.Tn ECC
549logic.  A `%d symbol ecc' error is a data error
550that can be (and presumably has been) corrected by the
551.Tn ECC
552logic.
553It might indicate a sector that is imperfect but usable, or that
554is starting to go bad.  If any of these errors recur, the sector
555may need to be replaced.
556.Pp
557.It "host buffer access error (%s) (code %d, subcode %d)"
558Something went wrong while trying to copy data to or from the host
559(Vax).  The subcode is one of `odd xfer addr', `odd xfer count',
560`non-exist. memory', or `memory parity'.  The first two could be a
561software glitch; the last two indicate hardware problems.
562.It controller error (%s) (code %d, subcode %d)
563The controller has detected a hardware error in itself.  A
564`serdes overrun' is a serialiser / deserialiser overrun; `edc'
565probably stands for `error detection code'; and `inconsistent
566internal data struct' is obvious.
567.Pp
568.It "drive error (%s) (code %d, subcode %d)"
569Either the controller or the drive has detected a hardware error
570in the drive.  I am not sure what an `sdi command timeout' is, but
571these seem to occur benignly on occasion.  A `ctlr detected protocol'
572error means that the controller and drive do not agree on a protocol;
573this could be a cabling problem, or a version mismatch.  A `positioner'
574error means the drive seek hardware is ailing; `lost rd/wr ready'
575means the drive read/write logic is sick; and `drive clock dropout'
576means that the drive clock logic is bad, or the media is hopelessly
577scrambled.  I have no idea what `lost recvr ready' means.  A `drive
578detected error' is a catch-all for drive hardware trouble; `ctlr
579detected pulse or parity' errors are often caused by cabling problems.
580.Ed
581.El
582.Sh SEE ALSO
583.Xr disklabel 5 ,
584.Xr disklabel 8
585.Sh HISTORY
586The
587.Nm
588driver appeared in
589.Bx 4.2 .
590