符号如何影响调用堆栈遍历? [英] How do symbols affect call stack walking?

查看:30
本文介绍了符号如何影响调用堆栈遍历?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 windbg 分析故障转储,但根据加载的符号,我得到了不稳定的故障转储.我的简单理解是,符号只是帮助指向堆栈所指的内容,但堆栈本身并没有改变.这显然是错误的,但现在我不知道我在看什么.

I'm trying to analyze a crash dump with windbg, and I'm getting inconstant crash dumps depending on what symbols are loaded. My simple understanding is that the symbols only help point to what the stack is referring to, but the stack itself is unchanged. That's obviously wrong, but now I don't know what the heck I'm looking at.

这是一个加载了所有符号的调用堆栈:

Heres a call stack with all symbols loaded:

0:000> kn
 # ChildEBP RetAddr  
00 0012e120 7d61f60f ntdll!ZwGetContextThread+0x12
01 0012e130 000f0005 ntdll!RtlFreeHeap+0x711
WARNING: Frame IP not in any known module. Following frames may be wrong.
02 0012e1d0 6d5b5b20 0xf0005
03 0012e314 6d5b407f dbghelp!Win32LiveSystemProvider::OpenMapping+0x228
04 0012e464 0012e488 dbghelp!GenAllocateModuleObject+0x1ad
05 0012e4e4 6d5b588e 0x12e488
06 0012e69c 7d4d132f dbghelp!Win32LiveSystemProvider::GetOsCsdString+0x4d
07 0012e6b8 6d5b5fd2 kernel32!ReadProcessMemory+0x1b
08 0012e6e0 6d5b604e dbghelp!Win32LiveSystemProvider::ReadVirtual+0x3d
09 0012e700 6d5b2f3d dbghelp!Win32LiveSystemProvider::ReadAllVirtual+0x1d
0a 0012e728 6d5b304f dbghelp!WriteMemoryFromProcess+0x35
0b 0012e7ac 6d5b345b dbghelp!WriteThreadList+0xc1
0c 0012e7cc 6d5b367b dbghelp!WriteDumpData+0x83
0d 0012e90c 6d5b3778 dbghelp!MiniDumpProvideDump+0x174
*** WARNING: Unable to verify checksum for ERRHNDLR.dll
0e 0012e96c 0091235d dbghelp!MiniDumpWriteDump+0xc8
*** WARNING: Unable to verify timestamp for msvcr90.dll
0f 0012e9fc 7857dcaa ERRHNDLR!ExceptionTranslator+0x25d [c:\redacted\errorhandler.cpp @ 230]
10 0012ea48 7857d4f5 msvcr90!_CallSETranslator+0xa5
11 0012ea7c 7857d8c0 msvcr90!__CxxExceptionFilter+0x217
12 0012eadc 7857d9dd msvcr90!__CxxExceptionFilter+0x5e2
13 0012eb10 7857db94 msvcr90!__InternalCxxFrameHandler+0xdb
*** WARNING: Unable to verify checksum for PROGRAM.exe
14 0012eb84 004f1c9e msvcr90!__CxxFrameHandler3+0x26
15 0012eba8 004f1c9e PROGRAM!__sse2_available_init+0x1269c
16 0012ec0c 00130000 PROGRAM!__sse2_available_init+0x1269c
17 00000000 00000000 0x130000

我可以看出发生了不好的事情,但似乎在应用启动后就发生了,但事实并非如此.

I can tell that something bad happened, but it appears to have happened as soon as the app started, which isn't the case.

这里有相同的调用堆栈,但没有加载 msvcr90 的符号

Heres the same call stack but without the symbols for msvcr90 loaded

0:000> kn
 # ChildEBP RetAddr  
00 0012e120 7d61f60f ntdll!ZwGetContextThread+0x12
01 0012e130 000f0005 ntdll!RtlFreeHeap+0x711
WARNING: Frame IP not in any known module. Following frames may be wrong.
02 0012e1d0 6d5b5b20 0xf0005
03 0012e314 6d5b407f dbghelp!Win32LiveSystemProvider::OpenMapping+0x228
04 0012e464 0012e488 dbghelp!GenAllocateModuleObject+0x1ad
05 0012e4e4 6d5b588e 0x12e488
06 0012e69c 7d4d132f dbghelp!Win32LiveSystemProvider::GetOsCsdString+0x4d
07 0012e6b8 6d5b5fd2 kernel32!ReadProcessMemory+0x1b
08 0012e6e0 6d5b604e dbghelp!Win32LiveSystemProvider::ReadVirtual+0x3d
09 0012e700 6d5b2f3d dbghelp!Win32LiveSystemProvider::ReadAllVirtual+0x1d
0a 0012e728 6d5b304f dbghelp!WriteMemoryFromProcess+0x35
0b 0012e7ac 6d5b345b dbghelp!WriteThreadList+0xc1
0c 0012e7cc 6d5b367b dbghelp!WriteDumpData+0x83
0d 0012e90c 6d5b3778 dbghelp!MiniDumpProvideDump+0x174
*** WARNING: Unable to verify checksum for ERRHNDLR.dll
0e 0012e96c 0091235d dbghelp!MiniDumpWriteDump+0xc8
*** WARNING: Unable to verify timestamp for msvcr90.dll
*** ERROR: Module load completed but symbols could not be loaded for msvcr90.dll
0f 0012e9fc 7857dcaa ERRHNDLR!ExceptionTranslator+0x25d [c:redacted\errorhandler.cpp @ 230]
10 0012ea48 7857d4f5 msvcr90+0x5dcaa
11 0012ea7c 7857d8c0 msvcr90+0x5d4f5
12 0012eadc 7857d9dd msvcr90+0x5d8c0
13 0012eb10 7857db94 msvcr90+0x5d9dd
14 0012eb4c 7d61ec4a msvcr90+0x5db94
15 0012eb70 7d61ec1b ntdll!ExecuteHandler2+0x26
16 0012ec18 7d61ea56 ntdll!ExecuteHandler+0x24
17 0012ec18 026fe31a ntdll!KiUserExceptionDispatcher+0xe
*** WARNING: Unable to verify checksum for Storage.dll
18 0012ef4c 026fddd0 Storage!CList<Property *,Property *>::AddTail+0xa [c:\program files (x86)\microsoft visual studio 9.0\vc\atlmfc\include\afxtempl.h @ 1003]
*** WARNING: Unable to verify checksum for Storage2.dll
19 0012ef54 0274f5ec Storage!PropertyList::Add+0x10 [c:\redacted\propertylist.cpp @ 236]
1a 0012ef5c 0012f280 Storage2!Thing::Process+0x12c [c:\redacted\thing.cpp @ 345]
1b 0012ef60 0fe8be80 0x12f280
*** WARNING: Unable to verify checksum for PROGRAM.exe
1c 0012f368 0043d9a1 0xfe8be80
1d 0012f3b0 004f1c9e PROGRAM!View::SelectObject+0x151 [c:\redacted\view.cpp @ 2724]
1e 0012f3d4 004ea73b PROGRAM!__sse2_available_init+0x1269c
*** WARNING: Unable to verify checksum for DLL1.dll
1f 0012f450 02847893 PROGRAM!__sse2_available_init+0xb139
*** WARNING: Unable to verify checksum for DLL2.dll
20 0012f4ac 02c06398 DLL1!_RawDllMainProxy+0x1ed5
21 0012f534 02c06b86 DLL2!__sse2_available_init+0x40eb
22 0012f5a8 02c03fdd DLL2!__sse2_available_init+0x48d9
23 0012f5e0 02c052f4 DLL2!__sse2_available_init+0x1d30
24 0012f664 0283c231 DLL2!__sse2_available_init+0x3047
25 0012f6b4 028475aa DLL1!Logic::Send+0x121 [c:\redacted\logic.cpp @ 438]
26 0012f750 7d94757c DLL1!_RawDllMainProxy+0x1bec
27 0012f7a4 00000000 user32!UserCallWinProcCheckWow+0x128

嘿,这可能很有用!当我使用它来调试故障转储时,它也更接近 Visual Studio 中显示的内容.但是VS的调用栈在"Storage2!Thing::Process"下面是完全不同的,说明不相关的函数不知何故都在调用栈中,这就是我尝试windbg的原因.

Hey, that may actually be useful! It's also closer to what is displayed in Visual Studio when I use it to debug the crash dump. But VS's call stack is completely different below "Storage2!Thing::Process", suggesting that unrelated functions are in the call stack somehow, which is why I'm trying windbg.

那么,我错过了什么?为什么卸载符号会揭示潜在更有用的调用堆栈?

So, what am I missing? Why should unloading symbols reveal a potentially more useful call stack?

推荐答案

答案很长,但简而言之:在 x86 PDB 上包含 FPO 信息,这允许调试器可靠地展开调用堆栈.这在 FPO 帧的情况下是必需的,其中 EBP 不用作帧指针.在没有 PDB 的情况下,调试器假设每一帧都是一个 EBP 帧,并且将简单地遍历 EBP 链,直到它到达末尾(即一个不可读的 EBP 值).

It's a long answer, but in short: On the x86 PDBs contain FPO information, which allows the debugger to reliably unwind a call stack. This is required in the case of FPO frames, where EBP is not used as a frame pointer. In the absence of PDBs, the debugger assumes that every frame is an EBP frame and will simply walk the EBP chain until it reaches the end (i.e. an unreadable EBP value).

有关 FPO 和 EBP 帧的更多详细信息,这里有一篇很好的文章:

For more details on FPO and EBP frames, there's a good article here:

http://www.nynaeve.net/?p=91

现在,开始解决您的问题.您显示的第一个调用堆栈是绝对正确的.某些模块抛出异常,因此 O/S 开始展开调用帧以寻找异常处理程序.不幸的是,没有人处理错误,因此默认异常处理程序运行,导致应用程序崩溃.由于违规代码的调用堆栈已展开,因此除了 O/S 提供的组件在堆栈上之外,您什么也看不到.

Now, to get to your issue. The first call stack that you showed is absolutely correct. Some module threw an exception, so the O/S began unwinding call frames looking for an exception handler. Unfortunately, no one handled the error so the default exception handler ran, which proceeded to crash the application. Because the call stack of the offending code was unwound, you don't see anything but the O/S supplied components on the stack.

在第二种情况下,您没有符号,因此 O/S 将每个调用帧视为 EBP.在这种情况下,您很幸运"并捡到了一个垃圾 EBP,它开始展开旧的调用堆栈.虽然在这种情况下它指出了正确的事情,但这是一种红鲱鱼,可能会导致您使用无效数据开始分析并浪费大量时间(去过那里,做到了!).

In the second case, you have no symbols and so the O/S treats every call frame as if it's EBP. In this case, you got "lucky" and picked up a garbage EBP that started to unwind an old call stack. While it pointed off to the right thing in this case, this is the sort of red herring that can cause you to start your analysis with invalid data and waste a LOT of time (been there, done that!).

.excr 命令总是在出现异常时正确执行的操作.这是有效的,因为在展开调用帧以查找异常处理程序之前,O/S 在异常发生时存储处理器的寄存器状态..excr 命令使用该状态将您及时带回到检测到不良状态的那一刻,而不是在 O/S 尝试处理它的事实之后.

The .excr command is always the correct thing to do in the case of an exception. This works because the O/S stores the register state of the processor at the time of the exception before unwinding call frames looking for an exception handler. The .excr command uses that state to bring you back in time to the moment where the bad state was detected, instead of after the fact while the O/S was trying to handle it.

-斯科特

这篇关于符号如何影响调用堆栈遍历?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆