xref: /openbsd/share/man/man8/crash.8 (revision 8932bfb7)
1.\"	$OpenBSD: crash.8,v 1.33 2010/11/08 15:52:05 sobrado Exp $
2.\"
3.\" Copyright (c) 1980, 1991 The Regents of the University of California.
4.\" All rights reserved.
5.\"
6.\" Redistribution and use in source and binary forms, with or without
7.\" modification, are permitted provided that the following conditions
8.\" are met:
9.\" 1. Redistributions of source code must retain the above copyright
10.\"    notice, this list of conditions and the following disclaimer.
11.\" 2. Redistributions in binary form must reproduce the above copyright
12.\"    notice, this list of conditions and the following disclaimer in the
13.\"    documentation and/or other materials provided with the distribution.
14.\" 3. Neither the name of the University nor the names of its contributors
15.\"    may be used to endorse or promote products derived from this software
16.\"    without specific prior written permission.
17.\"
18.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
19.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
22.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
24.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
25.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
26.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
27.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
28.\" SUCH DAMAGE.
29.\"
30.\"	from: @(#)crash.8	6.5 (Berkeley) 4/20/91
31.\"
32.Dd $Mdocdate: November 8 2010 $
33.Dt CRASH 8
34.Os
35.Sh NAME
36.Nm crash
37.Nd system failure and diagnosis
38.Sh DESCRIPTION
39This section explains what happens when the system crashes
40and (very briefly) how to analyze crash dumps.
41.Pp
42When the system crashes voluntarily it prints a message of the form
43.Bd -literal -offset indent
44panic: why i gave up the ghost
45.Ed
46.Pp
47on the console and enters the kernel debugger,
48.Xr ddb 4 .
49.Pp
50If you wish to report this panic, you should include the output of
51the
52.Ic ps
53and
54.Ic trace
55commands.
56Unless the
57.Sq ddb.log
58sysctl has been disabled, anything output to screen will be
59appended to the system message buffer, from where it may be
60possible to retrieve it through the
61.Xr dmesg 8
62command after a warm reboot.
63If the debugger command
64.Ic boot dump
65is entered, or if the debugger was not compiled into the kernel, or
66the debugger was disabled with
67.Xr sysctl 8 ,
68then the system dumps the contents of physical memory
69onto a mass storage peripheral device.
70The particular device used is determined by the
71.Sq dumps on
72directive in the
73.Xr config 8
74file used to build the kernel.
75.Pp
76After the dump has been written, the system then
77invokes the automatic reboot procedure as
78described in
79.Xr reboot 8 .
80If auto-reboot is disabled (in a machine dependent way) the system
81will simply halt at this point.
82.Pp
83Upon rebooting, and
84unless some unexpected inconsistency is encountered in the state
85of the file systems due to hardware or software failure, the system
86will copy the previously written dump into
87.Pa /var/crash
88using
89.Xr savecore 8 ,
90before resuming multi-user operations.
91.Ss Causes of system failure
92The system has a large number of internal consistency checks; if one
93of these fails, then it will panic with a very short message indicating
94which one failed.
95In many instances, this will be the name of the routine which detected
96the error, or a two-word description of the inconsistency.
97A full understanding of most panic messages requires perusal of the
98source code for the system.
99.Pp
100The most common cause of system failures is hardware failure
101.Pq e.g., bad memory
102which
103can reflect itself in different ways.
104Here are the messages which are most likely, with some hints as to causes.
105Left unstated in all cases is the possibility that a hardware or software
106error produced the message in some unexpected way.
107.Bl -tag -width indent
108.It no init
109This panic message indicates filesystem problems, and reboots are likely
110to be futile.
111Late in the bootstrap procedure, the system was unable to
112locate and execute the initialization process,
113.Xr init 8 .
114The root filesystem is incorrect or has been corrupted, or the mode
115or type of
116.Pa /sbin/init
117forbids execution.
118.It trap type %d, code=%x, pc=%x
119A unexpected trap has occurred within the system; the trap types are
120machine dependent and can be found listed in
121.Pa /sys/arch/ARCH/include/trap.h .
122.Pp
123The code is the referenced address, and the pc is the program counter at the
124time of the fault is printed.
125Hardware flakiness will sometimes generate this panic, but if the cause
126is a kernel bug,
127the kernel debugger
128.Xr ddb 4
129can be used to locate the instruction and subroutine inside the kernel
130corresponding
131to the PC value.
132If that is insufficient to suggest the nature of the problem,
133more detailed examination of the system status at the time of the trap
134usually can produce an explanation.
135.It init died
136The system initialization process has exited.
137This is bad news, as no new users will then be able to log in.
138Rebooting is the only fix, so the system just does it right away.
139.It out of mbufs: map full
140The network has exhausted its private page map for network buffers.
141This usually indicates that buffers are being lost, and rather than
142allow the system to slowly degrade, it reboots immediately.
143The map may be made larger if necessary.
144.El
145.Pp
146That completes the list of panic types you are likely to see.
147.Ss Analyzing a dump
148When the system crashes it writes (or at least attempts to write)
149an image of memory, including the kernel image, onto the dump device.
150On reboot, the kernel image and memory image are separated and preserved in
151the directory
152.Pa /var/crash .
153.Pp
154To analyze the kernel and memory images preserved as
155.Pa bsd.0
156and
157.Pa bsd.0.core ,
158you should run
159.Xr gdb 1 ,
160loading in the images with the following commands:
161.Bd -literal -offset indent
162# gdb
163GNU gdb 6.3
164Copyright 2004 Free Software Foundation, Inc.
165GDB is free software, covered by the GNU General Public License, and you are
166welcome to change it and/or distribute copies of it under certain conditions.
167Type "show copying" to see the conditions.
168There is absolutely no warranty for GDB.  Type "show warranty" for details.
169This GDB was configured as "i386-unknown-openbsd4.6".
170(gdb) file /var/crash/bsd.0
171Reading symbols from /var/crash/bsd.0...(no debugging symbols found)...done.
172(gdb) target kvm /var/crash/bsd.0.core
173.Ed
174.Pp
175[Note that the
176.Dq kvm
177target is currently only supported by
178.Xr gdb 1
179on some architectures.]
180.Pp
181After this, you can use the
182.Ic where
183command to show trace of procedure calls that led to the crash.
184.Pp
185For custom-built kernels, it is helpful if you had previously
186configured your kernel to include debugging symbols with
187.Sq makeoptions DEBUG="-g"
188.Pq see Xr options 4
189(though you will not be able to boot an unstripped kernel since it uses too
190much memory).
191In this case, you should use
192.Pa bsd.gdb
193instead of
194.Pa bsd.0 ,
195thus allowing
196.Xr gdb 1
197to show symbolic names for addresses and line numbers from the source.
198.Pp
199Analyzing saved system images is sometimes called post-mortem debugging.
200There are a class of analysis tools designed to work on
201both live systems and saved images, most of them are linked with the
202.Xr kvm 3
203library and share option flags to specify the kernel and memory image.
204These tools typically take the following flags:
205.Bl -tag -width indent
206.It Fl M Ar core
207Normally this
208.Ar core
209is an image produced by
210.Xr savecore 8
211but it can be
212.Pa /dev/mem
213too, if you are looking at the live system.
214.It Fl N Ar system
215Takes a kernel
216.Ar system
217image as an argument.
218This is where the symbolic information is gotten from,
219which means the image cannot be stripped.
220In some cases, using a
221.Pa bsd.gdb
222version of the kernel can assist even more.
223.El
224.Pp
225The following commands understand these options:
226.Xr fstat 1 ,
227.Xr netstat 1 ,
228.Xr nfsstat 1 ,
229.Xr ps 1 ,
230.Xr w 1 ,
231.Xr dmesg 8 ,
232.Xr iostat 8 ,
233.Xr kgmon 8 ,
234.Xr pstat 8 ,
235.Xr trpt 8 ,
236.Xr vmstat 8
237and many others.
238There are exceptions, however.
239For instance,
240.Xr ipcs 1
241has renamed the
242.Fl M
243argument to be
244.Fl C
245instead.
246.Pp
247Examples of use:
248.Bd -literal -offset indent
249# ps -N /var/crash/bsd.0 -M /var/crash/bsd.0.core -O paddr
250.Ed
251.Pp
252The
253.Fl O Ar paddr
254option prints each process'
255.Li struct proc
256address, but with the value of KERNBASE masked off.
257This is very useful information if you are analyzing process contexts in
258.Xr gdb 1 .
259You need to add back KERNBASE though, that value can be found in
260.Pa /usr/include/$ARCH/param.h .
261.Bd -literal -offset indent
262# vmstat -N /var/crash/bsd.0 -M /var/crash/bsd.0.core -m
263.Ed
264.Pp
265This analyzes memory allocations at the time of the crash.
266Perhaps some resource was starving the system?
267.Ss Analyzing a live kernel
268Like the tools mentioned above,
269.Xr gdb 1
270can be used to analyze a live system as well.
271This can be accomplished by not specifying a crash dump when selecting the
272.Dq kvm
273target:
274.Bd -literal -offset indent
275(gdb) target kvm
276.Ed
277.Pp
278It is possible to inspect processes that entered the kernel by
279specifying a process'
280.Li struct proc
281address to the
282.Ic kvm proc
283command:
284.Bd -literal -offset indent
285(gdb) kvm proc 0xd69dada0
286#0  0xd0355d91 in sleep_finish (sls=0x0, do_sleep=0)
287    at ../../../../kern/kern_synch.c:217
288217                     mi_switch();
289.Ed
290.Pp
291After this, the
292.Ic where
293command will show a trace of procedure calls, right back to where the
294selected process entered the kernel.
295.Sh CRASH LOCATION DETERMINATION
296The following example should make it easier for a novice kernel
297developer to find out where the kernel crashed.
298.Pp
299First, in
300.Xr ddb 4
301find the function that caused the crash.
302It is either the function at the top of the traceback or the function
303under the call to
304.Fn panic
305or
306.Fn uvm_fault .
307.Pp
308The point of the crash usually looks something like this "function+0x4711".
309.Pp
310Find the function in the sources, let's say that the function is in "foo.c".
311.Pp
312Go to the kernel build directory, e.g.,
313.Pa /sys/arch/ARCH/compile/GENERIC .
314.Pp
315Do the following:
316.Bd -literal -offset indent
317# rm foo.o
318# make DEBUG=-g foo.o
319# objdump -S foo.o | less
320.Ed
321.Pp
322Find the function in the output.
323The function will look something like this:
324.Bd -literal -offset indent
3250: 17 47 11 42         foo %x, bar, %y
3264: foo bar             allan %kaka
3278: XXXX                boink %bloyt
328etc.
329.Ed
330.Pp
331The first number is the offset.
332Find the offset that you got in the ddb trace
333(in this case it's 4711).
334.Pp
335When reporting data collected in this way, include ~20 lines before and ~10
336lines after the offset from the objdump output in the crash report, as well
337as the output of
338.Xr ddb 4 Ns 's
339"show registers" command.
340It's important that the output from objdump includes at least two or
341three lines of C code.
342.Sh REPORTING
343If you are sure you have found a reproducible software bug in the kernel,
344and need help in further diagnosis, or already have a fix, use
345.Xr sendbug 1
346to send the developers a detailed description including the entire session
347from
348.Xr gdb 1 .
349.Sh SEE ALSO
350.Xr gdb 1 ,
351.Xr sendbug 1 ,
352.Xr ddb 4 ,
353.Xr reboot 8 ,
354.Xr savecore 8
355