在内核崩溃转储期间分析CPU寄存器 [英] Analyzing CPU registers during kernel crash dump

查看:147
本文介绍了在内核崩溃转储期间分析CPU寄存器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在调试一个问题,并与生成的崩溃转储一起遇到了以下内核崩溃.在某种程度上,我确实知道,如何使用gdb(l *(debug_fucntion + 0x19))命令到达出现问题的代码中的确切行.

I was debugging a issue and hit the below kernel crash along with crash dump being generated. To some extent i do know, how to get to the exact line in the code where the issue occurred using gdb (l *(debug_fucntion+0x19)) command.

<1>BUG: unable to handle kernel paging request at ffffc90028213000
<1>IP: [<ffffffffa0180279>] debug_fucntion+0x19/0x160 [dise]
<4>PGD 103febe067 PUD 103febf067 PMD fd54e1067 PTE 0
<4>Oops: 0000 [#1] SMP
<4>last sysfs file: /sys/kernel/mm/ksm/run
<4>CPU 7
<4>Modules linked in: dise(P)(U) ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge autofs4 8021q garp stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan tun kvm uinput ipmi_devintf power_meter microcode iTCO_wdt iTCO_vendor_support dcdbas sg ses enclosure serio_raw lpc_ich mfd_core i7core_edac edac_core bnx2 ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dise]
<4>
<4>Pid: 1126, comm: diseproc Tainted: P        W  ---------------    2.6.32-431.el6.x86_64 #1 Dell Inc. PowerEdge R710/0MD99X
<4>RIP: 0010:[<ffffffffa0180279>]  [<ffffffffa0180279>] debug_fucntion+0x19/0x160 [dise]
<4>RSP: 0018:ffff880435fc5b88  EFLAGS: 00010282
<4>RAX: 0000000000000000 RBX: 0000000000010000 RCX: ffffc90028213000
<4>RDX: 0000000000010040 RSI: 0000000000010000 RDI: ffff880fe36a0000
<4>RBP: ffff880435fc5b88 R08: ffffffffa025d8a3 R09: 0000000000000000
<4>R10: 0000000000000004 R11: 0000000000000004 R12: 0000000000010040
<4>R13: 000000000000b101 R14: ffffc90028213010 R15: ffff880fe36a0000
<4>FS:  00007fbe6040b700(0000) GS:ffff8800618e0000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>CR2: ffffc90028213000 CR3: 0000000fc965b000 CR4: 00000000000007e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process diseproc (pid: 1126, threadinfo ffff880435fc4000, task ffff8807f8be8ae0)
<4>Stack:
<4> ffff880435fc5be8 ffffffffa0180498 0000000081158f46 00000c200000fd26
<4><d> ffffc90028162000 0000fec635fc5bc8 0000000000000018 ffff881011d80000
<4><d> ffffc90028162000 ffff8802f18fe440 ffff880fc80b4000 ffff880435fc5cec
<4>Call Trace:
<4> [<ffffffffa0180498>] cmd_dump+0x1c8/0x360 [dise]
<4> [<ffffffffa01978e1>] debug_log_show+0x91/0x160 [dise]
<4> [<ffffffffa013afb9>] process_debug+0x5a9/0x990 [dise]
<4> [<ffffffff810792c7>] ? current_fs_time+0x27/0x30
<4> [<ffffffffa013bc38>] dise_ioctl+0xd8/0x300 [dise]
<4> [<ffffffff8105a501>] ? hotplug_hrtick+0x21/0x60
<4> [<ffffffff8119db42>] vfs_ioctl+0x22/0xa0
<4> [<ffffffff8119dce4>] do_vfs_ioctl+0x84/0x580
<4> [<ffffffff8119e261>] sys_ioctl+0x81/0xa0
<4> [<ffffffff810e1e5e>] ? __audit_syscall_exit+0x25e/0x290
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>Code: be c4 10 e1 48 8b 5d d8 44 01 f0 4c 8b 65 e0 4c 8b 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 0f 1f 44 00 00 55 48 89 e5 0f 1f 44 00 00 <48> 8b 01 48 c1 e8 3c 83 f8 08 76 0b e8 f6 fb ff ff c9 c3 0f 1f
<1>RIP  [<ffffffffa0180279>] debug_fucntion+0x19/0x160 [dise]
<4> RSP <ffff880435fc5b88>
<4>CR2: ffffc90028213000

我的问题是

  1. 打印出的CPU寄存器内容可以提供更多信息吗?我该如何解码?

  1. Can the CPU register contents which are printed give more information? How do i decode them?

我可以从崩溃转储中了解导致崩溃的变量值或数据结构值吗?

Can i get to know variables values or data structure values from the crash dump which leads to the crash?

代码:是c4 10 e1 48 8b 5d ..."在这里告诉我什么?

What does the "Code : be c4 10 e1 48 8b 5d ... " tell me here?

推荐答案

您必须了解您正在以程序集级别(而非源代码)进行检查(而非调试).在检查故障转储时,必须牢记这一点.

You must understand that you are inspecting (not debugging) at assembly level (not source code). This is important thing that you must hold in your head when inspecting crash dumps.

您必须逐行仔细阅读崩溃转储报告,因为它包含很多信息,而这就是您所拥有的全部信息.

You have to read your crash dump report carefully line by line because it contains lots of info and also that's all you got.

当您的代码崩溃时到位-您必须通过阅读崩溃转储报告和反汇编来弄清楚为什么会发生这种情况.

When you got place when your code was crashed - you have to figure out why that happened by reading crash dump report and disassembly.

崩溃转储报告中的第一行告诉您

First line in your crash dump report tells you

BUG: unable to handle kernel paging request at ffffc90028213000

这意味着您正在使用无效的内存.

That means you are using invalid memory.

Process diseproc (pid: 1126, threadinfo ffff880435fc4000, task ffff8807f8be8ae0)

告诉您崩溃时间在用户空间中发生的情况.似乎用户空间进程diseproc向驱动程序发出了导致崩溃的命令.

tells you what happened in userspace on crash time. Seems like userspace process diseproc issued some command to your driver that caused crash.

非常重要的一行是

IP: [<ffffffffa0180279>] debug_fucntion+0x19/0x160 [dise]

尝试发出dis debug_function命令来反汇编debug_function,找到debug_function+25(十六进制的0x19 = 12月25日)并四处看看.与debug_function的C源代码一起阅读.通常,您可以通过比较callq指令在C代码中找到崩溃的地方-反汇编将显示被调用函数的可打印名称.

Try to issue dis debug_function command to disassemble debug_function, find debug_function+25(0x19 hex = 25 dec) and look around. Read it side by side with C source code for debug_function. Usually you can find crash place in C code by comparing callq instructions - disassembly will show printable name of called functions.

下一个也是最重要的是呼叫跟踪:

Next and most important is Call trace:

Call Trace:
 [<ffffffffa0180498>] cmd_dump+0x1c8/0x360 [dise]
 [<ffffffffa01978e1>] debug_log_show+0x91/0x160 [dise]
 [<ffffffffa013afb9>] process_debug+0x5a9/0x990 [dise]
 [<ffffffff810792c7>] ? current_fs_time+0x27/0x30
 [<ffffffffa013bc38>] dise_ioctl+0xd8/0x300 [dise]
 [<ffffffff8105a501>] ? hotplug_hrtick+0x21/0x60
 [<ffffffff8119db42>] vfs_ioctl+0x22/0xa0
 [<ffffffff8119dce4>] do_vfs_ioctl+0x84/0x580
 [<ffffffff8119e261>] sys_ioctl+0x81/0xa0
 [<ffffffff810e1e5e>] ? __audit_syscall_exit+0x25e/0x290
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

从头到尾阅读:内核获取了ioctl(显然来自diseproc),内核在 dise 模块中调用了ioctl处理程序dise_ioctl,然后依次是current_fs_timeprocess_debugdebug_log_show cmd_dump.

Reading bottom to top: kernel got ioctl (from diseproc, obvious), kernel invoked ioctl handler dise_ioctl in dise module, then current_fs_time, process_debug, debug_log_show and finally cmd_dump.

现在您知道了:

  • 代码路径:dise_ioctl-> current_fs_time-> process_debug-> debug_log_show-> cmd_dump->以某种方式到达debug_function.
  • 导致崩溃的C代码中的大概位置
  • 崩溃的原因:访问无效的内存
  • Code path: dise_ioctl -> current_fs_time -> process_debug -> debug_log_show -> cmd_dump -> somehow to debug_function.
  • Approximate place in C code that caused crash
  • Reason to crash: access to invalid memory

使用此信息,您必须使用最后一种最有效的方法-思考.尝试了解是什么变量/结构导致崩溃的.也许当您到达debug_function时,其中一些人已被释放?也许您在指针算术中输入错误了?

With this info you have to use your last and most powerful method - thinking. Try to understand what variables/structures caused crash. Maybe some of them was freed by the time you arrived in debug_function? Maybe you mistype in pointer arithmetic?

问题的答案:

  1. 大多数时候,CPU寄存器的值是无意义的,因为它与C代码无关.只是一些值,指向一些内存-随便什么.是的,有一些非常有用的寄存器,例如RIP/EIP和RSP/ESP,但其中大多数都与上下文无关.

  1. Most of the times CPU register values are pointless because it has nothing to do with your C code. Just some values, pointing to some memory - whatever. Yes, there are some extremely useful registers like RIP/EIP and RSP/ESP, but most of them is way too out of context.

极不可能.您实际上不是在调试-您正在检查转储-您没有任何调试上下文.

Very unlikely. You are actually not debugging - you are inspecting your dump - you don't have any debugging context.

我同意@ user2699113,它只是在RIP的指针下存储内容.

I agree with @user2699113 that it just memory content under pointer from RIP.

记住,最好的调试工具是你的大脑.

And remember - best debugging tool is your brain.

这篇关于在内核崩溃转储期间分析CPU寄存器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆