1Bug hunting
2+++++++++++
3
4Last updated: 28 October 2016
5
6Fixing the bug
7==============
8
9Nobody is going to tell you how to fix bugs. Seriously. You need to work it
10out. But below are some hints on how to use the tools.
11
12objdump
13-------
14
15To debug a kernel, use objdump and look for the hex offset from the crash
16output to find the valid line of code/assembler. Without debug symbols, you
17will see the assembler code for the routine shown, but if your kernel has
18debug symbols the C code will also be available. (Debug symbols can be enabled
19in the kernel hacking menu of the menu configuration.) For example::
20
21    $ objdump -r -S -l --disassemble net/dccp/ipv4.o
22
23.. note::
24
25   You need to be at the top level of the kernel tree for this to pick up
26   your C files.
27
28If you don't have access to the code you can also debug on some crash dumps
29e.g. crash dump output as shown by Dave Miller::
30
31     EIP is at 	+0x14/0x4c0
32      ...
33     Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00
34     00 00 55 57  56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08
35     <8b> 83 3c 01 00 00 89 44  24 14 8b 45 28 85 c0 89 44 24 18 0f 85
36
37     Put the bytes into a "foo.s" file like this:
38
39            .text
40            .globl foo
41     foo:
42            .byte  .... /* bytes from Code: part of OOPS dump */
43
44     Compile it with "gcc -c -o foo.o foo.s" then look at the output of
45     "objdump --disassemble foo.o".
46
47     Output:
48
49     ip_queue_xmit:
50         push       %ebp
51         push       %edi
52         push       %esi
53         push       %ebx
54         sub        $0xbc, %esp
55         mov        0xd0(%esp), %ebp        ! %ebp = arg0 (skb)
56         mov        0x8(%ebp), %ebx         ! %ebx = skb->sk
57         mov        0x13c(%ebx), %eax       ! %eax = inet_sk(sk)->opt
58
59gdb
60---
61
62In addition, you can use GDB to figure out the exact file and line
63number of the OOPS from the ``vmlinux`` file.
64
65The usage of gdb requires a kernel compiled with ``CONFIG_DEBUG_INFO``.
66This can be set by running::
67
68  $ ./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO
69
70On a kernel compiled with ``CONFIG_DEBUG_INFO``, you can simply copy the
71EIP value from the OOPS::
72
73 EIP:    0060:[<c021e50e>]    Not tainted VLI
74
75And use GDB to translate that to human-readable form::
76
77  $ gdb vmlinux
78  (gdb) l *0xc021e50e
79
80If you don't have ``CONFIG_DEBUG_INFO`` enabled, you use the function
81offset from the OOPS::
82
83 EIP is at vt_ioctl+0xda8/0x1482
84
85And recompile the kernel with ``CONFIG_DEBUG_INFO`` enabled::
86
87  $ make vmlinux
88  $ gdb vmlinux
89  (gdb) l *vt_ioctl+0xda8
90  0x1888 is in vt_ioctl (drivers/tty/vt/vt_ioctl.c:293).
91  288	{
92  289		struct vc_data *vc = NULL;
93  290		int ret = 0;
94  291
95  292		console_lock();
96  293		if (VT_BUSY(vc_num))
97  294			ret = -EBUSY;
98  295		else if (vc_num)
99  296			vc = vc_deallocate(vc_num);
100  297		console_unlock();
101
102or, if you want to be more verbose::
103
104  (gdb) p vt_ioctl
105  $1 = {int (struct tty_struct *, unsigned int, unsigned long)} 0xae0 <vt_ioctl>
106  (gdb) l *0xae0+0xda8
107
108You could, instead, use the object file::
109
110  $ make drivers/tty/
111  $ gdb drivers/tty/vt/vt_ioctl.o
112  (gdb) l *vt_ioctl+0xda8
113
114If you have a call trace, such as::
115
116     Call Trace:
117      [<ffffffff8802c8e9>] :jbd:log_wait_commit+0xa3/0xf5
118      [<ffffffff810482d9>] autoremove_wake_function+0x0/0x2e
119      [<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
120      ...
121
122this shows the problem likely in the :jbd: module. You can load that module
123in gdb and list the relevant code::
124
125  $ gdb fs/jbd/jbd.ko
126  (gdb) l *log_wait_commit+0xa3
127
128Another very useful option of the Kernel Hacking section in menuconfig is
129Debug memory allocations. This will help you see whether data has been
130initialised and not set before use etc. To see the values that get assigned
131with this look at ``mm/slab.c`` and search for ``POISON_INUSE``. When using
132this an Oops will often show the poisoned data instead of zero which is the
133default.
134
135Once you have worked out a fix please submit it upstream. After all open
136source is about sharing what you do and don't you want to be recognised for
137your genius?
138
139Please do read
140ref:`Documentation/process/submitting-patches.rst <submittingpatches>` though
141to help your code get accepted.
142