1.\" $OpenBSD: crash.8,v 1.31 2009/04/02 13:18:24 jmc Exp $ 2.\" 3.\" Copyright (c) 1980, 1991 The Regents of the University of California. 4.\" All rights reserved. 5.\" 6.\" Redistribution and use in source and binary forms, with or without 7.\" modification, are permitted provided that the following conditions 8.\" are met: 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in the 13.\" documentation and/or other materials provided with the distribution. 14.\" 3. Neither the name of the University nor the names of its contributors 15.\" may be used to endorse or promote products derived from this software 16.\" without specific prior written permission. 17.\" 18.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 19.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 20.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 21.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 22.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 23.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 24.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 25.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 26.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 27.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 28.\" SUCH DAMAGE. 29.\" 30.\" from: @(#)crash.8 6.5 (Berkeley) 4/20/91 31.\" 32.Dd $Mdocdate: April 2 2009 $ 33.Dt CRASH 8 34.Os 35.Sh NAME 36.Nm crash 37.Nd system failure and diagnosis 38.Sh DESCRIPTION 39This section explains what happens when the system crashes 40and (very briefly) how to analyze crash dumps. 41.Pp 42When the system crashes voluntarily it prints a message of the form 43.Bd -literal -offset indent 44panic: why i gave up the ghost 45.Ed 46.Pp 47on the console and enters the kernel debugger, 48.Xr ddb 4 . 49.Pp 50If you wish to report this panic, you should include the output of 51the 52.Ic ps 53and 54.Ic trace 55commands. 56Unless the 57.Sq ddb.log 58sysctl has been disabled, anything output to screen will be 59appended to the system message buffer, from where it may be 60possible to retrieve it through the 61.Xr dmesg 8 62command after a warm reboot. 63If the debugger command 64.Ic boot dump 65is entered, or if the debugger was not compiled into the kernel, or 66the debugger was disabled with 67.Xr sysctl 8 , 68then the system dumps the contents of physical memory 69onto a mass storage peripheral device. 70The particular device used is determined by the 71.Sq dumps on 72directive in the 73.Xr config 8 74file used to build the kernel. 75.Pp 76After the dump has been written, the system then 77invokes the automatic reboot procedure as 78described in 79.Xr reboot 8 . 80If auto-reboot is disabled (in a machine dependent way) the system 81will simply halt at this point. 82.Pp 83Upon rebooting, and 84unless some unexpected inconsistency is encountered in the state 85of the file systems due to hardware or software failure, the system 86will copy the previously written dump into 87.Pa /var/crash 88using 89.Xr savecore 8 , 90before resuming multi-user operations. 91.Ss Causes of system failure 92The system has a large number of internal consistency checks; if one 93of these fails, then it will panic with a very short message indicating 94which one failed. 95In many instances, this will be the name of the routine which detected 96the error, or a two-word description of the inconsistency. 97A full understanding of most panic messages requires perusal of the 98source code for the system. 99.Pp 100The most common cause of system failures is hardware failure 101.Pq e.g., bad memory 102which 103can reflect itself in different ways. 104Here are the messages which are most likely, with some hints as to causes. 105Left unstated in all cases is the possibility that a hardware or software 106error produced the message in some unexpected way. 107.Bl -tag -width indent 108.It no init 109This panic message indicates filesystem problems, and reboots are likely 110to be futile. 111Late in the bootstrap procedure, the system was unable to 112locate and execute the initialization process, 113.Xr init 8 . 114The root filesystem is incorrect or has been corrupted, or the mode 115or type of 116.Pa /sbin/init 117forbids execution. 118.It trap type %d, code=%x, pc=%x 119A unexpected trap has occurred within the system; the trap types are 120machine dependent and can be found listed in 121.Pa /sys/arch/ARCH/include/trap.h . 122.Pp 123The code is the referenced address, and the pc is the program counter at the 124time of the fault is printed. 125Hardware flakiness will sometimes generate this panic, but if the cause 126is a kernel bug, 127the kernel debugger 128.Xr ddb 4 129can be used to locate the instruction and subroutine inside the kernel 130corresponding 131to the PC value. 132If that is insufficient to suggest the nature of the problem, 133more detailed examination of the system status at the time of the trap 134usually can produce an explanation. 135.It init died 136The system initialization process has exited. 137This is bad news, as no new users will then be able to log in. 138Rebooting is the only fix, so the system just does it right away. 139.It out of mbufs: map full 140The network has exhausted its private page map for network buffers. 141This usually indicates that buffers are being lost, and rather than 142allow the system to slowly degrade, it reboots immediately. 143The map may be made larger if necessary. 144.El 145.Pp 146That completes the list of panic types you are likely to see. 147.Ss Analyzing a dump 148When the system crashes it writes (or at least attempts to write) 149an image of memory, including the kernel image, onto the dump device. 150On reboot, the kernel image and memory image are separated and preserved in 151the directory 152.Pa /var/crash . 153.Pp 154To analyze the kernel and memory images preserved as 155.Pa bsd.0 156and 157.Pa bsd.0.core , 158you should run 159.Xr gdb 1 , 160loading in the images with the following commands: 161.Bd -literal -offset indent 162# gdb 163GNU gdb 6.3 164Copyright 2004 Free Software Foundation, Inc. 165GDB is free software, covered by the GNU General Public License, and you are 166welcome to change it and/or distribute copies of it under certain conditions. 167Type "show copying" to see the conditions. 168There is absolutely no warranty for GDB. Type "show warranty" for details. 169This GDB was configured as "i386-unknown-openbsd4.6". 170(gdb) file /var/crash/bsd.0 171Reading symbols from /var/crash/bsd.0...(no debugging symbols found)...done. 172(gdb) target kvm /var/crash/bsd.0.core 173.Ed 174.Pp 175[Note that the 176.Dq kvm 177target is currently only supported by 178.Xr gdb 1 179on some architectures.] 180.Pp 181After this, you can use the 182.Ic where 183command to show trace of procedure calls that led to the crash. 184.Pp 185For custom-built kernels, it is helpful if you had previously 186configured your kernel to include debugging symbols with 187.Sq makeoptions DEBUG="-g" 188.Pq see Xr options 4 189(though you will not be able to boot an unstripped kernel since it uses too 190much memory). 191In this case, you should use 192.Pa bsd.gdb 193instead of 194.Pa bsd.0 , 195thus allowing 196.Xr gdb 1 197to show symbolic names for addresses and line numbers from the source. 198.Pp 199Analyzing saved system images is sometimes called post-mortem debugging. 200There are a class of analysis tools designed to work on 201both live systems and saved images, most of them are linked with the 202.Xr kvm 3 203library and share option flags to specify the kernel and memory image. 204These tools typically take the following flags: 205.Bl -tag -width indent 206.It Fl N Ar system 207Takes a kernel 208.Ar system 209image as an argument. 210This is where the symbolic information is gotten from, 211which means the image cannot be stripped. 212In some cases, using a 213.Pa bsd.gdb 214version of the kernel can assist even more. 215.It Fl M Ar core 216Normally this 217.Ar core 218is an image produced by 219.Xr savecore 8 220but it can be 221.Pa /dev/mem 222too, if you are looking at the live system. 223.El 224.Pp 225The following commands understand these options: 226.Xr fstat 1 , 227.Xr netstat 1 , 228.Xr nfsstat 1 , 229.Xr ps 1 , 230.Xr systat 1 , 231.Xr w 1 , 232.Xr dmesg 8 , 233.Xr iostat 8 , 234.Xr kgmon 8 , 235.Xr pstat 8 , 236.Xr slstats 8 , 237.Xr trpt 8 , 238.Xr vmstat 8 239and many others. 240There are exceptions, however. 241For instance, 242.Xr ipcs 1 243has renamed the 244.Fl M 245argument to be 246.Fl C 247instead. 248.Pp 249Examples of use: 250.Bd -literal -offset indent 251# ps -N /var/crash/bsd.0 -M /var/crash/bsd.0.core -O paddr 252.Ed 253.Pp 254The 255.Fl O Ar paddr 256option prints each process' 257.Li struct proc 258address, but with the value of KERNBASE masked off. 259This is very useful information if you are analyzing process contexts in 260.Xr gdb 1 . 261You need to add back KERNBASE though, that value can be found in 262.Pa /usr/include/$ARCH/param.h . 263.Bd -literal -offset indent 264 # vmstat -N /var/crash/bsd.0 -M /var/crash/bsd.0.core -m 265.Ed 266.Pp 267This analyzes memory allocations at the time of the crash. 268Perhaps some resource was starving the system? 269.Ss Analyzing a live kernel 270Like the tools mentioned above, 271.Xr gdb 1 272can be used to analyze a live system as well. 273This can be accomplished by not specifying a crash dump when selecting the 274.Dq kvm 275target: 276.Bd -literal -offset indent 277(gdb) target kvm 278.Ed 279.Pp 280It is possible to inspect processes that entered the kernel by 281specifying a process' 282.Li struct proc 283address to the 284.Ic kvm proc 285command: 286.Bd -literal -offset indent 287(gdb) kvm proc 0xd69dada0 288#0 0xd0355d91 in sleep_finish (sls=0x0, do_sleep=0) 289 at ../../../../kern/kern_synch.c:217 290217 mi_switch(); 291.Ed 292.Pp 293After this, the 294.Ic where 295command will show a trace of procedure calls, right back to where the 296selected process entered the kernel. 297.Sh CRASH LOCATION DETERMINATION 298The following example should make it easier for a novice kernel 299developer to find out where the kernel crashed. 300.Pp 301First, in 302.Xr ddb 4 303find the function that caused the crash. 304It is either the function at the top of the traceback or the function 305under the call to 306.Fn panic 307or 308.Fn uvm_fault . 309.Pp 310The point of the crash usually looks something like this "function+0x4711". 311.Pp 312Find the function in the sources, let's say that the function is in "foo.c". 313.Pp 314Go to the kernel build directory, e.g., 315.Pa /sys/arch/ARCH/compile/GENERIC . 316.Pp 317Do the following: 318.Bd -literal -offset indent 319# rm foo.o 320# make DEBUG=-g foo.o 321# objdump -S foo.o | less 322.Ed 323.Pp 324Find the function in the output. 325The function will look something like this: 326.Bd -literal -offset indent 3270: 17 47 11 42 foo %x, bar, %y 3284: foo bar allan %kaka 3298: XXXX boink %bloyt 330etc. 331.Ed 332.Pp 333The first number is the offset. 334Find the offset that you got in the ddb trace 335(in this case it's 4711). 336.Pp 337When reporting data collected in this way, include ~20 lines before and ~10 338lines after the offset from the objdump output in the crash report, as well 339as the output of 340.Xr ddb 4 Ns 's 341"show registers" command. 342It's important that the output from objdump includes at least two or 343three lines of C code. 344.Sh REPORTING 345If you are sure you have found a reproducible software bug in the kernel, 346and need help in further diagnosis, or already have a fix, use 347.Xr sendbug 1 348to send the developers a detailed description including the entire session 349from 350.Xr gdb 1 . 351.Sh SEE ALSO 352.Xr gdb 1 , 353.Xr sendbug 1 , 354.Xr ddb 4 , 355.Xr reboot 8 , 356.Xr savecore 8 357