xref: /openbsd/share/man/man8/crash.8 (revision d485f761)
1.\"	$OpenBSD: crash.8,v 1.14 2001/10/05 14:45:54 mpech Exp $
2.\"
3.\" Copyright (c) 1980, 1991 The Regents of the University of California.
4.\" All rights reserved.
5.\"
6.\" Redistribution and use in source and binary forms, with or without
7.\" modification, are permitted provided that the following conditions
8.\" are met:
9.\" 1. Redistributions of source code must retain the above copyright
10.\"    notice, this list of conditions and the following disclaimer.
11.\" 2. Redistributions in binary form must reproduce the above copyright
12.\"    notice, this list of conditions and the following disclaimer in the
13.\"    documentation and/or other materials provided with the distribution.
14.\" 3. All advertising materials mentioning features or use of this software
15.\"    must display the following acknowledgement:
16.\"	This product includes software developed by the University of
17.\"	California, Berkeley and its contributors.
18.\" 4. Neither the name of the University nor the names of its contributors
19.\"    may be used to endorse or promote products derived from this software
20.\"    without specific prior written permission.
21.\"
22.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
23.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
24.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
25.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
26.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
27.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
28.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
29.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
30.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
31.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
32.\" SUCH DAMAGE.
33.\"
34.\"	from: @(#)crash.8	6.5 (Berkeley) 4/20/91
35.\"
36.Dd February 23, 2000
37.Dt CRASH 8
38.Os
39.Sh NAME
40.Nm crash
41.Nd system failure and diagnosis
42.Sh DESCRIPTION
43This section explains what happens when the system crashes
44and (very briefly) how to analyze crash dumps.
45.Pp
46When the system crashes voluntarily it prints a message of the form
47.Pp
48.Bd -literal
49        panic: why i gave up the ghost
50.Ed
51.Pp
52on the console and enters the kernel debugger,
53.Xr ddb 4 .
54If the debugger command
55.Ic boot dump
56is entered, or if the debugger was not compiled into the kernel, or
57the debugger was disabled with
58.Xr sysctl 8 ,
59then the system dumps the contents of physical memory
60onto a mass storage peripheral device.
61The particular device used is determined by the
62.Sq dumps on
63directive in the
64.Xr config 8
65file used to build the kernel.
66.Pp
67After the dump has been written, the system then
68invokes the automatic reboot procedure as
69described in
70.Xr reboot 8 .
71If auto-reboot is disabled (in a machine dependent way) the system
72will simply halt at this point.
73.Pp
74Upon rebooting, and
75unless some unexpected inconsistency is encountered in the state
76of the file systems due to hardware or software failure, the system
77will copy the previously written dump into
78.Pa /var/crash
79using
80.Xr savecore 8 ,
81before resuming multi-user operations.
82.Ss Causes of system failure
83The system has a large number of internal consistency checks; if one
84of these fails, then it will panic with a very short message indicating
85which one failed.
86In many instances, this will be the name of the routine which detected
87the error, or a two-word description of the inconsistency.
88A full understanding of most panic messages requires perusal of the
89source code for the system.
90.Pp
91The most common cause of system failures is hardware failure
92.Pq e.g., bad memory
93which
94can reflect itself in different ways.
95Here are the messages which are most likely, with some hints as to causes.
96Left unstated in all cases is the possibility that a hardware or software
97error produced the message in some unexpected way.
98.Bl -tag -width indent
99.It no init
100This panic message indicates filesystem problems, and reboots are likely
101to be futile.
102Late in the bootstrap procedure, the system was unable to
103locate and execute the initialization process,
104.Xr init 8 .
105The root filesystem is incorrect or has been corrupted, or the mode
106or type of
107.Pa /sbin/init
108forbids execution.
109.It timeout table overflow
110.ns
111This really shouldn't be a panic, but until the data structure
112involved is made to be extensible, running out of entries causes a crash.
113If this happens, make the timeout table bigger.
114.It trap type %d, code=%x, pc=%x
115A unexpected trap has occurred within the system; the trap types are
116machine dependent and can be found listed in
117.Pa /sys/arch/ARCH/include/trap.h .
118.Pp
119The code is the referenced address, and the pc is the program counter at the
120time of the fault is printed.
121Hardware flakiness will sometimes generate this panic, but if the cause
122is a kernel bug,
123the kernel debugger
124.Xr ddb 4
125can be used to locate the instruction and subroutine inside the kernel
126corresponding
127to the PC value.
128If that is insufficient to suggest the nature of the problem,
129more detailed examination of the system status at the time of the trap
130usually can produce an explanation.
131.It init died
132The system initialization process has exited.
133This is bad news, as no new users will then be able to log in.
134Rebooting is the only fix, so the system just does it right away.
135.It out of mbufs: map full
136The network has exhausted its private page map for network buffers.
137This usually indicates that buffers are being lost, and rather than
138allow the system to slowly degrade, it reboots immediately.
139The map may be made larger if necessary.
140.El
141.Pp
142That completes the list of panic types you are likely to see.
143.Ss Analyzing a dump
144When the system crashes it writes (or at least attempts to write)
145an image of memory, including the kernel image, onto the dump device.
146On reboot, the kernel image and memory image are separated and preserved in
147the directory
148.Pa /var/crash .
149.Pp
150To analyze the kernel and memory images preserved as
151.Pa bsd.0
152and
153.Pa bsd.0.core ,
154you should run
155.Xr gdb 1 ,
156loading in the images with the following commands:
157.Pp
158.Bd -literal -offset indent
159# gdb
160GNU gdb 4.16.1
161Copyright 1996 Free Software Foundation, Inc.
162GDB is free software, covered by the GNU General Public License, and you are
163welcome to change it and/or distribute copies of it under certain conditions.
164Type "show copying" to see the conditions.
165There is absolutely no warranty for GDB.
166Type "show warranty" for details.
167This GDB was configured as "i386-unknown-openbsd2.8".
168(gdb) file /var/crash/bsd.0
169Reading symbols from /var/crash/bsd.0...(no debugging symbols found)...done.
170(gdb) target kcore /var/crash/bsd.0.core
171.Ed
172.Pp
173After this, you can use the
174.Ic where
175command to show trace of procedure calls that led to the crash.
176.Pp
177For custom-built kernels, it is helpful if you had previously
178configured your kernel to include debugging symbols with
179.Sq makeoptions DEBUG=-ggdb
180.Pq see Xr options 4
181(though you will not be able to boot an unstripped kernel since it uses too
182much memory).
183In this case, you should use
184.Pa bsd.gdb
185instead of
186.Pa bsd.0 ,
187thus allowing
188.Xr gdb 1
189to show symbolic names for addresses and line numbers from the source.
190.Pp
191Analyzing saved system images is sometimes called post-mortem debugging.
192There are a class of analysis tools designed to work on
193both live systems and saved images, most of them are linked with the
194.Xr kvm 3
195library and share option flags to specify the kernel and memory image.
196These tools typically take the following flags:
197.Bl -tag -width indent
198.It Fl N Ar system
199Takes a kernel
200.Ar system
201image as an argument.
202This is where the symbolic information is gotten from,
203which means the image cannot be stripped.
204In some cases, using a
205.Pa bsd.gdb
206version of the kernel can assist even more.
207.It Fl M Ar core
208Normally this
209.Ar core
210is an image produced by
211.Xr savecore 8
212but it can be
213.Pa /dev/mem
214too, if you are looking at the live system.
215.El
216.Pp
217The following commands understand these options:
218.Xr fstat 1 ,
219.Xr netstat 1 ,
220.Xr nfsstat 1 ,
221.Xr ps 1 ,
222.Xr systat 1 ,
223.Xr w 1 ,
224.Xr dmesg 8 ,
225.Xr iostat 8 ,
226.Xr kgmon 8 ,
227.Xr pstat 8 ,
228.Xr slstats 8 ,
229.Xr trpt 8 ,
230.Xr trsp 8 ,
231.Xr vmstat 8
232and many others.
233There are exceptions, however.
234For instance,
235.Xr ipcs 1
236has renamed the
237.Fl M
238argument to be
239.Fl C
240instead.
241.Pp
242Examples of use:
243.Pp
244.Bd -literal
245    # ps -N /var/crash/bsd.0 -M /var/crash/bsd.0.core -O paddr
246.Ed
247.Pp
248The
249.Fl O Ar paddr
250option prints each process'
251.Li struct proc
252address, but with the value of KERNBASE masked off.
253This is very useful information if you are analyzing process contexts in
254.Xr gdb 1 .
255You need to add back KERNBASE though, that value can be found in
256.Pa /usr/include/$ARCH/param.h .
257.Pp
258.Bd -literal
259    # vmstat -N /var/crash/bsd.0 -M /var/crash/bsd.0.core -m
260.Ed
261.Pp
262This analyzes memory allocations at the time of the crash.
263Perhaps some resource was starving the system?
264.Sh CRASH LOCATION DETERMINATION
265The following example should make it easier for a novice kernel
266developer to find out where the kernel crashed.
267.Pp
268First, in
269.Xr ddb 4
270find the function that caused the crash.
271It is either the function at the top of the traceback or the function
272under the call to
273.Fn panic
274or
275.Fn uvm_fault .
276.Pp
277The point of the crash usually looks something like this "function+0x4711".
278.Pp
279Find the function in the sources, let's say that the function is in "foo.c".
280.Pp
281Goto the kernel build directory, i.e.,
282.Pa /sys/arch/ARCH/compile/GENERIC .
283.Pp
284Do the following:
285.Bd -literal
286    # rm foo.o
287    # make -n foo.o | sed 's,-c,-g -c,' | sh
288    # objdump -S foo.o | less
289.Ed
290.Pp
291Find the function in the output.
292The function will look something like this:
293.Pp
294.Bd -literal
295     0: 17 47 11 42         foo %x, bar, %y
296     4: foo bar             allan %kaka
297     8: XXXX                boink %bloyt
298    etc.
299.Ed
300.Pp
301The first number is the offset.
302Find the offset that you got in the ddb trace
303(in this case it's 4711).
304.Pp
305When reporting data collected in this way, include ~20 lines before and ~10
306lines after the offset from the objdump output in the crash report, as well
307as the output of
308.Xr ddb 4 Ns 's
309"show registers" command.
310It's important that the output from objdump includes at least two or
311three lines of C code.
312.Sh REPORTING
313If you are sure you have found a reproducible software bug in the kernel,
314and need help in further diagnosis, or already have a fix, use
315.Xr sendbug 1
316to send the developers a detailed description including the entire session
317from
318.Xr gdb 1 .
319.Sh "SEE ALSO"
320.Xr gdb 1 ,
321.Xr ddb 4 ,
322.Xr reboot 8 ,
323.Xr savecore 8 ,
324.Xr sendbug 1
325