如何利用好堆栈跟踪(从内核或核心转储)? [英] How to make good use of stack trace (from kernel or core dump)?
问题描述
如果你是幸运的,当你的内核模块崩溃,你会得到与大量的信息,日志一个oops如寄存器等这样的一个信息是堆栈跟踪值(这同样适用于核心转储真,但我本来问这个内核模块)。拿这个例子:
[< f97ade02>] skink_free_devices + 0x32 / 0XB0 [skin_kernel]
并[d f97aba45]的计算值?在cleanup_module + 0x1e5 / 0x550 [skin_kernel]
并[d c017d0e7]的计算值? __stop_machine + 0×57 / 0x70
并[d c016dec0]的计算值? __try_stop_module +为0x0 /的0x30
并[d c016f069]的计算值? sys_delete_module + 0x149 / 0x210
并[d c0102f24]的计算值? sysenter_do_call + 0×12 / 0x16
我的猜测是, +<数字1> /<数字2>
牵扯到与其中发生误差函数的偏移。即,通过检查这个号码,或许看组件输出我应该能够找出线(更好的是,指令),其中发生了此错误。这是否正确?
我的问题是,究竟是这两个数字?你如何使用它们?
skink_free_devices + 0x32 / 0XB0
这意味着违规的指令是 0x32
从函数的起始字节 skink_free_devices()
是反0xB0
字节长的总。
如果您编译 -g
启用内核,那么你就可以进去功能行号使用工具 addr2line <控制的地方跳下/ code>或我们的好老
GDB
像这样
$ addr2line -e ./vmlinux 0xc01cf0d1
/mnt/linux-2.5.26/include/asm/bitops.h:244
要么
$ GDB ./vmlinux
...
(GDB)L * 0xc01cf0d1
0xc01cf0d1是read_chan(包括/ ASM / bitops.h:244)。
(......)
244回((1UL&LT;&LT;(NR及31条))及(((const的挥发无符号整数*)地址)NR&GT;&GT; 5))!= 0;
(......)
所以才给你想要检查的地址 addr2line
或 GDB
,他们会告诉你的行号源文件,其中有问题的功能是present在
请参见的全部细节此文章
编辑: 的vmlinux
是pssed内核版本的uncom $ P $用于调试,一般发现@ / lib / modules目录/ $(使用uname -r)/建设/ vmlinux的
如果已经建立从来源内核。 的vmlinuz
你找到的/ boot
是COM pressed内核,不得在调试有用
If you are lucky when your kernel module crashes, you would get an oops with a log with a lot of information, such as values in the registers etc. One such information is the stack trace (The same is true for core dumps, but I had originally asked this for kernel modules). Take this example:
[<f97ade02>] ? skink_free_devices+0x32/0xb0 [skin_kernel]
[<f97aba45>] ? cleanup_module+0x1e5/0x550 [skin_kernel]
[<c017d0e7>] ? __stop_machine+0x57/0x70
[<c016dec0>] ? __try_stop_module+0x0/0x30
[<c016f069>] ? sys_delete_module+0x149/0x210
[<c0102f24>] ? sysenter_do_call+0x12/0x16
My guess is that the +<number1>/<number2>
has something to do with the offset from function in which the error has occurred. That is, by inspecting this number, perhaps looking at the assembly output I should be able to find out the line (better yet, instruction) in which this error has occurred. Is that correct?
My question is, what are these two numbers exactly? How do you make use of them?
skink_free_devices+0x32/0xb0
This means the offending instruction is 0x32
bytes from the start of the function skink_free_devices()
which is 0xB0
bytes long in total.
If you compile your kernel with -g
enabled, then you can get the line number inside functions where the control jumped using the tool addr2line
or our good old gdb
Something like this
$ addr2line -e ./vmlinux 0xc01cf0d1
/mnt/linux-2.5.26/include/asm/bitops.h:244
or
$ gdb ./vmlinux
...
(gdb) l *0xc01cf0d1
0xc01cf0d1 is in read_chan (include/asm/bitops.h:244).
(...)
244 return ((1UL << (nr & 31)) & (((const volatile unsigned int *) addr)[nr >> 5])) != 0;
(...)
So just give the address you want to inspect to addr2line
or gdb
and they shall tell you the line number in the source file where the offending function is present
See this article for full details
EDIT: vmlinux
is the uncompressed version of the kernel used for debugging and is generally found @ /lib/modules/$(uname -r)/build/vmlinux
provided you have built your kernel from sources. vmlinuz
that you find at /boot
is the compressed kernel and may not be that useful in debugging
这篇关于如何利用好堆栈跟踪(从内核或核心转储)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!