1.\" Copyright (c) 1990, 1991 Regents of the University of California. 2.\" All rights reserved. 3.\" 4.\" %sccs.include.redist.man% 5.\" 6.\" @(#)crash.8 5.2 (Berkeley) 03/16/91 7.\" 8.Dd 9.Dt CRASH 8 10.Os 11.Sh NAME 12.Nm crash 13.Nd UNIX system failures 14.Sh DESCRIPTION 15This section explains a bit about system crashes 16and (very briefly) how to analyze crash dumps. 17.Pp 18When the system crashes voluntarily it prints a message of the form 19.Bd -ragged -offset indent 20panic: why i gave up the ghost 21.Ed 22.Pp 23on the console, takes a dump on a mass storage peripheral, 24and then invokes an automatic reboot procedure as 25described in 26.Xr reboot 8 . 27Unless some unexpected inconsistency is encountered in the state 28of the file systems due to hardware or software failure, the system 29will then resume multi-user operations. 30.Pp 31The system has a large number of internal consistency checks; if one 32of these fails, then it will panic with a very short message indicating 33which one failed. 34In many instances, this will be the name of the routine which detected 35the error, or a two-word description of the inconsistency. 36A full understanding of most panic messages requires perusal of the 37source code for the system. 38.Pp 39The most common cause of system failures is hardware failure, which 40can reflect itself in different ways. Here are the messages which 41are most likely, with some hints as to causes. 42Left unstated in all cases is the possibility that hardware or software 43error produced the message in some unexpected way. 44.Pp 45.Bl -tag -width Ds -compact 46.It Sy iinit 47This cryptic panic message results from a failure to mount the root filesystem 48during the bootstrap process. 49Either the root filesystem has been corrupted, 50or the system is attempting to use the wrong device as root filesystem. 51Usually, an alternate copy of the system binary or an alternate root 52filesystem can be used to bring up the system to investigate. 53.Pp 54.It Sy "Can't exec /etc/init" 55This is not a panic message, as reboots are likely to be futile. 56Late in the bootstrap procedure, the system was unable to locate 57and execute the initialization process, 58.Xr init 8 . 59The root filesystem is incorrect or has been corrupted, or the mode 60or type of 61.Pa /etc/init 62forbids execution. 63.Pp 64.It Sy "IO err in push" 65.It Sy "hard IO err in swap" 66The system encountered an error trying to write to the paging device 67or an error in reading critical information from a disk drive. 68The offending disk should be fixed if it is broken or unreliable. 69.Pp 70.It Sy "realloccg: bad optim" 71.It Sy "ialloc: dup alloc" 72.It Sy "alloccgblk:cyl groups corrupted" 73.It Sy "ialloccg: map corrupted" 74.It Sy "free: freeing free block" 75.It Sy "free: freeing free frag" 76.It Sy "ifree: freeing free inode" 77.It Sy "alloccg: map corrupted" 78These panic messages are among those that may be produced 79when filesystem inconsistencies are detected. 80The problem generally results from a failure to repair damaged filesystems 81after a crash, hardware failures, or other condition that should not 82normally occur. 83A filesystem check will normally correct the problem. 84.Pp 85.It Sy "timeout table overflow" 86This really shouldn't be a panic, but until the data structure 87involved is made to be extensible, running out of entries causes a crash. 88If this happens, make the timeout table bigger. 89.Pp 90.It Sy "trap type %d, code = %x, v = %x" 91An unexpected trap has occurred within the system; the trap types are: 92.Bl -column xxxx -offset indent 930 bus error 941 address error 952 illegal instruction 963 divide by zero 97.No 4\t Em chk No instruction 98.No 5\t Em trapv No instruction 996 privileged instruction 1007 trace trap 1018 MMU fault 1029 simulated software interrupt 10310 format error 10411 FP coprocessor fault 10512 coprocessor fault 10613 simulated AST 107.El 108.Pp 109The favorite trap type in system crashes is trap type 8, 110indicating a wild reference. 111``code'' (hex) is the concatenation of the 112MMU 113status register 114(see <hp300/cpu.h>) 115in the high 16 bits and the 68020 special status word 116(see the 68020 manual, page 6-17) 117in the low 16. 118``v'' (hex) is the virtual address which caused the fault. 119Additionally, the kernel will dump about a screenful of semi-useful 120information. 121``pid'' (decimal) is the process id of the process running at the 122time of the exception. 123Note that if we panic in an interrupt routine, 124this process may not be related to the panic. 125``ps'' (hex) is the 68020 processor status register ``ps''. 126``pc'' (hex) is the value of the program counter saved 127on the hardware exception frame. 128It may 129.Em not 130be the PC of the instruction causing the fault. 131``sfc'' and ``dfc'' (hex) are the 68020 source/destination function codes. 132They should always be one. 133``p0'' and ``p1'' are the 134VAX-like 135region registers. 136They are of the form: 137.Pp 138.Bd -ragged -offset indent 139<length> '@' <kernel VA> 140.Ed 141.Pp 142where both are in hex. 143Following these values are a dump of the processor registers (hex). 144Finally, is a dump of the stack (user/kernel) at the time of the offense. 145.Pp 146.It Sy "init died" 147The system initialization process has exited. This is bad news, as no new 148users will then be able to log in. Rebooting is the only fix, so the 149system just does it right away. 150.Pp 151.It Sy "out of mbufs: map full" 152The network has exhausted its private page map for network buffers. 153This usually indicates that buffers are being lost, and rather than 154allow the system to slowly degrade, it reboots immediately. 155The map may be made larger if necessary. 156.El 157.Pp 158That completes the list of panic types you are likely to see. 159.Pp 160When the system crashes it writes (or at least attempts to write) 161an image of memory into the back end of the dump device, 162usually the same as the primary swap 163area. After the system is rebooted, the program 164.Xr savecore 8 165runs and preserves a copy of this core image and the current 166system in a specified directory for later perusal. See 167.Xr savecore 8 168for details. 169.Pp 170To analyze a dump you should begin by running 171.Xr adb 1 172with the 173.Fl k 174flag on the system load image and core dump. 175If the core image is the result of a panic, 176the panic message is printed. 177Normally the command 178``$c'' 179will provide a stack trace from the point of 180the crash and this will provide a clue as to 181what went wrong. 182For more details consult 183.%T "Using ADB to Debug the UNIX Kernel" . 184.Sh SEE ALSO 185.Xr adb 1 , 186.Xr reboot 8 187.Rs 188.%T "MC68020 32-bit Microprocessor User's Manual" 189.Re 190.Rs 191.%T "Using ADB to Debug the UNIX Kernel 192.Re 193.Rs 194.%T "4.3BSD for the HP300" 195.Re 196.Sh HISTORY 197A 198.Nm 199man page appeared in Version 6 AT&T UNIX. 200