1.\" $OpenBSD: crash.8,v 1.14 2001/10/05 14:45:54 mpech Exp $ 2.\" 3.\" Copyright (c) 1980, 1991 The Regents of the University of California. 4.\" All rights reserved. 5.\" 6.\" Redistribution and use in source and binary forms, with or without 7.\" modification, are permitted provided that the following conditions 8.\" are met: 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in the 13.\" documentation and/or other materials provided with the distribution. 14.\" 3. All advertising materials mentioning features or use of this software 15.\" must display the following acknowledgement: 16.\" This product includes software developed by the University of 17.\" California, Berkeley and its contributors. 18.\" 4. Neither the name of the University nor the names of its contributors 19.\" may be used to endorse or promote products derived from this software 20.\" without specific prior written permission. 21.\" 22.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 23.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 24.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 25.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 26.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 27.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 28.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 29.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 30.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 31.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 32.\" SUCH DAMAGE. 33.\" 34.\" from: @(#)crash.8 6.5 (Berkeley) 4/20/91 35.\" 36.Dd February 23, 2000 37.Dt CRASH 8 38.Os 39.Sh NAME 40.Nm crash 41.Nd system failure and diagnosis 42.Sh DESCRIPTION 43This section explains what happens when the system crashes 44and (very briefly) how to analyze crash dumps. 45.Pp 46When the system crashes voluntarily it prints a message of the form 47.Pp 48.Bd -literal 49 panic: why i gave up the ghost 50.Ed 51.Pp 52on the console and enters the kernel debugger, 53.Xr ddb 4 . 54If the debugger command 55.Ic boot dump 56is entered, or if the debugger was not compiled into the kernel, or 57the debugger was disabled with 58.Xr sysctl 8 , 59then the system dumps the contents of physical memory 60onto a mass storage peripheral device. 61The particular device used is determined by the 62.Sq dumps on 63directive in the 64.Xr config 8 65file used to build the kernel. 66.Pp 67After the dump has been written, the system then 68invokes the automatic reboot procedure as 69described in 70.Xr reboot 8 . 71If auto-reboot is disabled (in a machine dependent way) the system 72will simply halt at this point. 73.Pp 74Upon rebooting, and 75unless some unexpected inconsistency is encountered in the state 76of the file systems due to hardware or software failure, the system 77will copy the previously written dump into 78.Pa /var/crash 79using 80.Xr savecore 8 , 81before resuming multi-user operations. 82.Ss Causes of system failure 83The system has a large number of internal consistency checks; if one 84of these fails, then it will panic with a very short message indicating 85which one failed. 86In many instances, this will be the name of the routine which detected 87the error, or a two-word description of the inconsistency. 88A full understanding of most panic messages requires perusal of the 89source code for the system. 90.Pp 91The most common cause of system failures is hardware failure 92.Pq e.g., bad memory 93which 94can reflect itself in different ways. 95Here are the messages which are most likely, with some hints as to causes. 96Left unstated in all cases is the possibility that a hardware or software 97error produced the message in some unexpected way. 98.Bl -tag -width indent 99.It no init 100This panic message indicates filesystem problems, and reboots are likely 101to be futile. 102Late in the bootstrap procedure, the system was unable to 103locate and execute the initialization process, 104.Xr init 8 . 105The root filesystem is incorrect or has been corrupted, or the mode 106or type of 107.Pa /sbin/init 108forbids execution. 109.It timeout table overflow 110.ns 111This really shouldn't be a panic, but until the data structure 112involved is made to be extensible, running out of entries causes a crash. 113If this happens, make the timeout table bigger. 114.It trap type %d, code=%x, pc=%x 115A unexpected trap has occurred within the system; the trap types are 116machine dependent and can be found listed in 117.Pa /sys/arch/ARCH/include/trap.h . 118.Pp 119The code is the referenced address, and the pc is the program counter at the 120time of the fault is printed. 121Hardware flakiness will sometimes generate this panic, but if the cause 122is a kernel bug, 123the kernel debugger 124.Xr ddb 4 125can be used to locate the instruction and subroutine inside the kernel 126corresponding 127to the PC value. 128If that is insufficient to suggest the nature of the problem, 129more detailed examination of the system status at the time of the trap 130usually can produce an explanation. 131.It init died 132The system initialization process has exited. 133This is bad news, as no new users will then be able to log in. 134Rebooting is the only fix, so the system just does it right away. 135.It out of mbufs: map full 136The network has exhausted its private page map for network buffers. 137This usually indicates that buffers are being lost, and rather than 138allow the system to slowly degrade, it reboots immediately. 139The map may be made larger if necessary. 140.El 141.Pp 142That completes the list of panic types you are likely to see. 143.Ss Analyzing a dump 144When the system crashes it writes (or at least attempts to write) 145an image of memory, including the kernel image, onto the dump device. 146On reboot, the kernel image and memory image are separated and preserved in 147the directory 148.Pa /var/crash . 149.Pp 150To analyze the kernel and memory images preserved as 151.Pa bsd.0 152and 153.Pa bsd.0.core , 154you should run 155.Xr gdb 1 , 156loading in the images with the following commands: 157.Pp 158.Bd -literal -offset indent 159# gdb 160GNU gdb 4.16.1 161Copyright 1996 Free Software Foundation, Inc. 162GDB is free software, covered by the GNU General Public License, and you are 163welcome to change it and/or distribute copies of it under certain conditions. 164Type "show copying" to see the conditions. 165There is absolutely no warranty for GDB. 166Type "show warranty" for details. 167This GDB was configured as "i386-unknown-openbsd2.8". 168(gdb) file /var/crash/bsd.0 169Reading symbols from /var/crash/bsd.0...(no debugging symbols found)...done. 170(gdb) target kcore /var/crash/bsd.0.core 171.Ed 172.Pp 173After this, you can use the 174.Ic where 175command to show trace of procedure calls that led to the crash. 176.Pp 177For custom-built kernels, it is helpful if you had previously 178configured your kernel to include debugging symbols with 179.Sq makeoptions DEBUG=-ggdb 180.Pq see Xr options 4 181(though you will not be able to boot an unstripped kernel since it uses too 182much memory). 183In this case, you should use 184.Pa bsd.gdb 185instead of 186.Pa bsd.0 , 187thus allowing 188.Xr gdb 1 189to show symbolic names for addresses and line numbers from the source. 190.Pp 191Analyzing saved system images is sometimes called post-mortem debugging. 192There are a class of analysis tools designed to work on 193both live systems and saved images, most of them are linked with the 194.Xr kvm 3 195library and share option flags to specify the kernel and memory image. 196These tools typically take the following flags: 197.Bl -tag -width indent 198.It Fl N Ar system 199Takes a kernel 200.Ar system 201image as an argument. 202This is where the symbolic information is gotten from, 203which means the image cannot be stripped. 204In some cases, using a 205.Pa bsd.gdb 206version of the kernel can assist even more. 207.It Fl M Ar core 208Normally this 209.Ar core 210is an image produced by 211.Xr savecore 8 212but it can be 213.Pa /dev/mem 214too, if you are looking at the live system. 215.El 216.Pp 217The following commands understand these options: 218.Xr fstat 1 , 219.Xr netstat 1 , 220.Xr nfsstat 1 , 221.Xr ps 1 , 222.Xr systat 1 , 223.Xr w 1 , 224.Xr dmesg 8 , 225.Xr iostat 8 , 226.Xr kgmon 8 , 227.Xr pstat 8 , 228.Xr slstats 8 , 229.Xr trpt 8 , 230.Xr trsp 8 , 231.Xr vmstat 8 232and many others. 233There are exceptions, however. 234For instance, 235.Xr ipcs 1 236has renamed the 237.Fl M 238argument to be 239.Fl C 240instead. 241.Pp 242Examples of use: 243.Pp 244.Bd -literal 245 # ps -N /var/crash/bsd.0 -M /var/crash/bsd.0.core -O paddr 246.Ed 247.Pp 248The 249.Fl O Ar paddr 250option prints each process' 251.Li struct proc 252address, but with the value of KERNBASE masked off. 253This is very useful information if you are analyzing process contexts in 254.Xr gdb 1 . 255You need to add back KERNBASE though, that value can be found in 256.Pa /usr/include/$ARCH/param.h . 257.Pp 258.Bd -literal 259 # vmstat -N /var/crash/bsd.0 -M /var/crash/bsd.0.core -m 260.Ed 261.Pp 262This analyzes memory allocations at the time of the crash. 263Perhaps some resource was starving the system? 264.Sh CRASH LOCATION DETERMINATION 265The following example should make it easier for a novice kernel 266developer to find out where the kernel crashed. 267.Pp 268First, in 269.Xr ddb 4 270find the function that caused the crash. 271It is either the function at the top of the traceback or the function 272under the call to 273.Fn panic 274or 275.Fn uvm_fault . 276.Pp 277The point of the crash usually looks something like this "function+0x4711". 278.Pp 279Find the function in the sources, let's say that the function is in "foo.c". 280.Pp 281Goto the kernel build directory, i.e., 282.Pa /sys/arch/ARCH/compile/GENERIC . 283.Pp 284Do the following: 285.Bd -literal 286 # rm foo.o 287 # make -n foo.o | sed 's,-c,-g -c,' | sh 288 # objdump -S foo.o | less 289.Ed 290.Pp 291Find the function in the output. 292The function will look something like this: 293.Pp 294.Bd -literal 295 0: 17 47 11 42 foo %x, bar, %y 296 4: foo bar allan %kaka 297 8: XXXX boink %bloyt 298 etc. 299.Ed 300.Pp 301The first number is the offset. 302Find the offset that you got in the ddb trace 303(in this case it's 4711). 304.Pp 305When reporting data collected in this way, include ~20 lines before and ~10 306lines after the offset from the objdump output in the crash report, as well 307as the output of 308.Xr ddb 4 Ns 's 309"show registers" command. 310It's important that the output from objdump includes at least two or 311three lines of C code. 312.Sh REPORTING 313If you are sure you have found a reproducible software bug in the kernel, 314and need help in further diagnosis, or already have a fix, use 315.Xr sendbug 1 316to send the developers a detailed description including the entire session 317from 318.Xr gdb 1 . 319.Sh "SEE ALSO" 320.Xr gdb 1 , 321.Xr ddb 4 , 322.Xr reboot 8 , 323.Xr savecore 8 , 324.Xr sendbug 1 325