如何利用好堆栈跟踪(从内核或核心转储)? [英] How to make good use of stack trace (from kernel or core dump)?

查看:247
本文介绍了如何利用好堆栈跟踪(从内核或核心转储)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果你是幸运的,当你的内核模块崩溃,你会得到与大量的信息,日志一个oops如寄存器等这样的一个信息是堆栈跟踪值(这同样适用于核心转储真,但我本来问这个内核模块)。拿这个例子:

  [< f97ade02>] skink_free_devices + 0x32 / 0XB0 [skin_kernel]
并[d f97aba45]的计算值?在cleanup_module + 0x1e5 / 0x550 [skin_kernel]
并[d c017d0e7]的计算值? __stop_machine + 0×57 / 0x70
并[d c016dec0]的计算值? __try_stop_module +为0x0 /的0x30
并[d c016f069]的计算值? sys_delete_module + 0x149 / 0x210
并[d c0102f24]的计算值? sysenter_do_call + 0×12 / 0x16

我的猜测是, +<数字1> /<数字2> 牵扯到与其中发生误差函数的偏移。即,通过检查这个号码,或许看组件输出我应该能够找出线(更好的是,指令),其中发生了此错误。这是否正确?

我的问题是,究竟是这两个数字?你如何使用它们?


解决方案

  skink_free_devices + 0x32 / 0XB0

这意味着违规的指令是 0x32 从函数的起始字节 skink_free_devices()反0xB0 字节长的总。

如果您编译 -g 启用内核,那么你就可以进去功能行号使用工具 addr2line <控制的地方跳下/ code>或我们的好老 GDB

像这样

  $ addr2line -e ./vmlinux 0xc01cf0d1
/mnt/linux-2.5.26/include/asm/bitops.h:244
要么
$ GDB ./vmlinux
...
(GDB)L * 0xc01cf0d1
0xc01cf0d1是read_chan(包括/ ASM / bitops.h:244)。
(......)
244回((1UL&L​​T;&LT;(NR及31条))及(((const的挥发无符号整数*)地址)NR&GT;&GT; 5))!= 0;
(......)

所以才给你想要检查的地址 addr2line GDB ,他们会告诉你的行号源文件,其中有问题的功能是present在
请参见的全部细节此文章

编辑: 的vmlinux 是pssed内核版本的uncom $ P $用于调试,一般发现@ / lib / modules目录/ $(使用uname -r)/建设/ vmlinux的如果已经建立从来源内核。 的vmlinuz 你找到的/ boot 是COM pressed内核,不得在调试有用

If you are lucky when your kernel module crashes, you would get an oops with a log with a lot of information, such as values in the registers etc. One such information is the stack trace (The same is true for core dumps, but I had originally asked this for kernel modules). Take this example:

[<f97ade02>] ? skink_free_devices+0x32/0xb0 [skin_kernel]
[<f97aba45>] ? cleanup_module+0x1e5/0x550 [skin_kernel]
[<c017d0e7>] ? __stop_machine+0x57/0x70
[<c016dec0>] ? __try_stop_module+0x0/0x30
[<c016f069>] ? sys_delete_module+0x149/0x210
[<c0102f24>] ? sysenter_do_call+0x12/0x16

My guess is that the +<number1>/<number2> has something to do with the offset from function in which the error has occurred. That is, by inspecting this number, perhaps looking at the assembly output I should be able to find out the line (better yet, instruction) in which this error has occurred. Is that correct?

My question is, what are these two numbers exactly? How do you make use of them?

解决方案

skink_free_devices+0x32/0xb0

This means the offending instruction is 0x32 bytes from the start of the function skink_free_devices() which is 0xB0 bytes long in total.

If you compile your kernel with -g enabled, then you can get the line number inside functions where the control jumped using the tool addr2line or our good old gdb

Something like this

$ addr2line -e ./vmlinux 0xc01cf0d1
/mnt/linux-2.5.26/include/asm/bitops.h:244
or
$ gdb ./vmlinux
...
(gdb) l *0xc01cf0d1
0xc01cf0d1 is in read_chan (include/asm/bitops.h:244).
(...)
244     return ((1UL << (nr & 31)) & (((const volatile unsigned int *) addr)[nr >> 5])) != 0;
(...)

So just give the address you want to inspect to addr2line or gdb and they shall tell you the line number in the source file where the offending function is present See this article for full details

EDIT: vmlinux is the uncompressed version of the kernel used for debugging and is generally found @ /lib/modules/$(uname -r)/build/vmlinux provided you have built your kernel from sources. vmlinuz that you find at /boot is the compressed kernel and may not be that useful in debugging

这篇关于如何利用好堆栈跟踪(从内核或核心转储)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆