如何调试难以重现的崩溃没有有用的调用堆栈? [英] How do I debug a difficult-to-reproduce crash with no useful call stack?

查看:774
本文介绍了如何调试难以重现的崩溃没有有用的调用堆栈?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在我们的软件中遇到了一个奇怪的崩溃,我在调试时遇到了很多麻烦,因此我寻求如何解决这个问题的建议。

I am encountering an odd crash in our software and I'm having a lot of trouble debugging it, and so I am seeking SO's advice on how to tackle it.

崩溃是访问冲突读取NULL指针:

The crash is an access violation reading a NULL pointer:

$ 00CF0041的第一次机会异常。
异常类$ C0000005,带有消息
'0x00cf0041处的访问冲突:读取
地址0x00000000'。

First chance exception at $00CF0041. Exception class $C0000005 with message 'access violation at 0x00cf0041: read of address 0x00000000'.

它只会发生'有时' - 我没有设法找出任何押韵或理性,但是,当时 - 只有在主线程。发生时,调用堆栈包含一个不正确的条目:

It only happens 'sometimes' - I haven't managed to figure out any rhyme or reason, yet, for when - and only in the main thread. When it occurs, the call stack contains one incorrect entry:

对于主线程,这是,它应该显示一个大堆栈

For the main thread, which this is, it should show a large stack full of other items.

此时,所有其他线程都处于非活动状态(主要位于 WaitForSingleObject 或类似函数。)我只看到这个崩溃发生在主线程。它总是具有相同的调用堆栈的一个条目,在相同的方法在相同的地址。这种方法可能或可能不相关 - 我们在我们的应用程序中使用VCL。我的赌注,虽然,是(可能很久以前)的东西破坏了堆栈,并且崩溃的地址是有效的随机。请注意,它在多个版本中是相同的地址,但它可能不是真正随机的。

At this point, all other threads are inactive (mostly sitting in WaitForSingleObject or a similar function.) I have only seen this crash occur in the main thread. It always has the same call stack of one entry, in the same method at the same address. This method may or may not be related - we do use the VCL in our application. My bet, though, is that something (possibly quite a while ago) is corrupting the stack, and the address where it's crashing is effectively random. Note it has been the same address across several builds, though - it's probably not truly random.

这是我试过的:


  • 尝试在某一点可靠地再现。我发现没有什么,每次都重现它,和一些偶尔做的事情,或不,没有明显的理由。这些不是'狭窄'足够的动作,缩小到一个特定的代码段。它可能是与时间相关的,但是在IDE打开的时候,其他线程通常什么也不做。

  • 使用额外的调试语句(额外的调试信息,额外的断言等)进行构建在这样做之后,崩溃永远不会发生

  • 使用 Codeguard 启用建立。在这样做后,崩溃从未发生,Codeguard显示没有错误。

  • Trying to reproduce it reliably at a certain point. I have found nothing that reproduces it every time, and a couple of things that occasionally do, or do not, for no apparent reason. These are not 'narrow' enough actions to narrow it down to a particular section of code. It may be timing related, but at the point the IDE breaks in, other threads are usually doing nothing. I can't rule out a threading problem, but think it's unlikely.
  • Building with extra debugging statements (extra debug info, extra asserts, etc.) After doing so, the crash never occurs.
  • Building with Codeguard enabled. After doing so, the crash never occurs and Codeguard shows no errors.

我的问题:

1。我如何找到什么代码导致崩溃?我如何做相当于回堆栈?

2。

我使用了 Embarcadero RAD Studio 2010 (该项目主要包含C ++ Builder代码和少量Delphi。)

I am using Embarcadero RAD Studio 2010 (the project mostly contains C++ Builder code and small amounts of Delphi.)

编辑:我以为我应该添加实际造成这种情况的原因。有一个主题名为 ReadDirectoryChangesW ,然后使用 GetOverlappedResult 等待事件继续并对更改进行操作。还设置状态标志后,还发出信号通知事件以终止线程。问题是,当线程退出时,它从未调用 CancelIO 。因此,即使缓冲区,重叠结构和事件不再存在(创建它们的线程上下文),Windows仍然跟踪更改,并且可能仍然在目录更改时写入缓冲区。当<$ c

I thought I should add what actually caused this. There was a thread that called ReadDirectoryChangesW and then, using GetOverlappedResult, waited on an event to continue and do something with the changes. The event was also signalled in order to terminate the thread after setting a status flag. The problem was that when the thread exited it never called CancelIO. As a result, Windows was still tracking changes and probably still writing to the buffer when the directory changed, even though the buffer, overlapped structure and event no longer existed (nor did the thread context in which they were created.) When CancelIO was called, there were no more crashes.

推荐答案

即使IDE-提供的堆栈跟踪不是很完整,这并不意味着堆栈上仍然没有有用的信息。打开CPU视图并检查堆栈窗格;对于每个CALL操作码,在堆栈上推送返回地址。因为堆栈向下增长,你会发现这些返回地址在当前堆栈位置之上,即在堆栈窗格中向上滚动。

Even when the IDE-provided stack trace isn't very complete, that doesn't mean there isn't still useful information on the stack. Open up the CPU view and check out the stack pane; for every CALL opcode, a return address is pushed on the stack. Since the stack grows downwards, you'll find these return addresses above the current stack location, i.e. by scrolling upwards in the stack pane.

主线程的堆栈在某地约$ 00120000或$ 00180000(地址空间随机化在Vista和向上已使它更随机)。主可执行文件的代码将在$ 00400000左右。您可以通过右键单击堆栈条目并选择跟随 - >近似代码来推测性地调查堆栈中看起来不像整数数据(低值)或堆栈地址($ 00120000 + ,这将导致反汇编窗口跳转到该代码地址。如果它看起来像无效代码,它可能不是一个有效的条目在堆栈跟踪。如果它是有效的代码,它可能是操作系统代码(经常大约$ 77000000及以上),在这种情况下,你不会有有意义的符号,但每次都会碰到一个实际的正确的堆栈条目。

The stack for the main thread will be somewhere around $00120000 or $00180000 (address space randomization in Vista and upwards has made it more random). Code for the main executable will be somewhere around $00400000. You can speculatively investigate elements on the stack that don't look like integer data (low values) or stack addresses ($00120000+ range) by right-clicking on the stack entry and selecting Follow -> Near Code, which will cause the disassembly window to jump to that code address. If it looks like invalid code, it's probably not a valid entry in the stack trace. If it's valid code, it may be OS code (frequently around $77000000 and above) in which case you won't have meaningful symbols, but every so often you'll hit on an actual proper stack entry.

这种技术虽然有点费力,但当调试器无法跟踪事件时,可以获得有意义的堆栈跟踪信息。它不帮助你,如果ESP(堆栈指针)已被拧紧,虽然。幸运的是,这是很罕见的。

This technique, though somewhat laborious, can get you meaningful stack trace info when the debugger isn't able to trace things through. It doesn't help you if ESP (the stack pointer) has been screwed with, though. Fortunately, that's pretty rare.

这篇关于如何调试难以重现的崩溃没有有用的调用堆栈?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆