Win32下的堆腐败如何找到? [英] Heap corruption under Win32; how to locate?

查看:191
本文介绍了Win32下的堆腐败如何找到?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一种破坏堆的多线程 C ++应用程序。找到这种腐败的常用工具似乎不适用。源代码的旧版本(18个月)表现出与最新版本相同的行为,所以这已经存在了很长时间,只是没有注意到;在缺点上,源代码三角形不能用于识别何时引入错误 - 在存储库中有很多代码更改。

I'm working on a multithreaded C++ application that is corrupting the heap. The usual tools to locate this corruption seem to be inapplicable. Old builds (18 months old) of the source code exhibit the same behaviour as the most recent release, so this has been around for a long time and just wasn't noticed; on the downside, source deltas can't be used to identify when the bug was introduced - there are a lot of code changes in the repository.

行为崩溃的提示是在此系统中生成吞吐量 - 套接字传输内部表示的数据。我有一组测试数据会定期导致应用程序异常(各种场合,各种原因 - 包括堆分配失败,因此:堆损坏)。

The prompt for crashing behaviuor is to generate throughput in this system - socket transfer of data which is munged into an internal representation. I have a set of test data that will periodically cause the app to exception (various places, various causes - including heap alloc failing, thus: heap corruption).

行为似乎与CPU功率或内存带宽有关;每台机器越多,它就越容易崩溃。禁用超线程内核或双核内核可降低(但不能消除)损坏的速度。这表示一个与时间有关的问题。

The behaviour seems related to CPU power or memory bandwidth; the more of each the machine has, the easier it is to crash. Disabling a hyper-threading core or a dual-core core reduces the rate of (but does not eliminate) corruption. This suggests a timing related issue.

现在这里是rub:

当它在一个轻量级的调试环境下运行(比如 Visual Studio 98 / AKA MSVC6 )堆损坏是相当容易复制的 - 十分钟或十五分钟之内,某些事情失败,例外,例如 alloc; 当在复杂的调试环境(Rational Purify, VS2008 / MSVC9 甚至Microsoft Application Verifier)运行时,系统变为内存速度限制,不会崩溃(Memory-限制:CPU没有超过 50%,磁盘指示灯不亮,程序的速度可以快,盒子消耗 1.3G 的2G的RAM)。所以,我可以选择是否能够重现问题(但不能识别原因),或者能够识别原因或者无法复制的问题。

Now here's the rub:
When it's run under a lightweight debug environment (say Visual Studio 98 / AKA MSVC6) the heap corruption is reasonably easy to reproduce - ten or fifteen minutes pass before something fails horrendously and exceptions, like an alloc; when running under a sophisticated debug environment (Rational Purify, VS2008/MSVC9 or even Microsoft Application Verifier) the system becomes memory-speed bound and doesn't crash (Memory-bound: CPU is not getting above 50%, disk light is not on, the program's going as fast it can, box consuming 1.3G of 2G of RAM). So, I've got a choice between being able to reproduce the problem (but not identify the cause) or being able to idenify the cause or a problem I can't reproduce.

我目前最好猜测下一个是:

My current best guesses as to where to next is:


  1. 获得一个疯狂的grunty盒子更换当前的开发框:2Gb RAM在一个 E6550 Core2 Duo );这将使得有可能在强大的调试环境下运行时重新生成导致错误行为的崩溃;或

  2. 重写运算符删除以使用 VirtualAlloc VirtualProtect 将内存标记为只读,一旦完成。在 MSVC6 下运行,并让操作系统抓住写入的坏人,释放内存。是的,这是绝望的标志:谁将重写新的删除?我想知道这是否会像Purify等一样缓慢。

  1. Get an insanely grunty box (to replace the current dev box: 2Gb RAM in an E6550 Core2 Duo); this will make it possible to repro the crash causing mis-behaviour when running under a powerful debug environment; or
  2. Rewrite operators new and delete to use VirtualAlloc and VirtualProtect to mark memory as read-only as soon as it's done with. Run under MSVC6 and have the OS catch the bad-guy who's writing to freed memory. Yes, this is a sign of desperation: who the hell rewrites new and delete?! I wonder if this is going to make it as slow as under Purify et al.

而且,否:装有Purify仪器的内置不是一个选项。

And, no: Shipping with Purify instrumentation built in is not an option.

一位同事刚刚走过去,问堆栈溢出?现在我们正在堆栈溢出吗?

A colleague just walked past and asked "Stack Overflow? Are we getting stack overflows now?!?"

现在,问题是:如何找到堆腐蚀剂?

更新:平衡 new [] delete [] 似乎在解决问题上有很长的路要走。而不是15分钟,应用程序现在在崩溃前大约两个小时。还没有有进一步的建议吗?堆腐败仍然存在。

Update: balancing new[] and delete[] seems to have gotten a long way towards solving the problem. Instead of 15mins, the app now goes about two hours before crashing. Not there yet. Any further suggestions? The heap corruption persists.

更新:Visual Studio 2008下的发布版本似乎更好;目前的怀疑取决于 VS98 附带的 STL 实现。

Update: a release build under Visual Studio 2008 seems dramatically better; current suspicion rests on the STL implementation that ships with VS98.



  1. 重现问题。 Dr. Watson 将产生可能有助于进一步分析的转储。

  1. Reproduce the problem. Dr Watson will produce a dump that might be helpful in further analysis.


我会记下这一点,但是我很担心沃森博士只会在事实发生后,不要在堆上踩踏的时候。

I'll take a note of that, but I'm concerned that Dr Watson will only be tripped up after the fact, not when the heap is getting stomped on.


另一个尝试可能是使用 WinDebug 作为一个非常强大的调试工具,同时也是轻量级的。

Another try might be using WinDebug as a debugging tool which is quite powerful being at the same time also lightweight.

那一刻,再一次:没有多少帮助,直到出现问题。我想抓住这个行为中的破坏。

Got that going at the moment, again: not much help until something goes wrong. I want to catch the vandal in the act.


也许这些工具可以让您至少将问题缩小到某些组件。 p>

Maybe these tools will allow you at least to narrow the problem to certain component.

我没有太多的希望,但绝望的时间要求...

I don't hold much hope, but desperate times call for...

您确定项目的所有组件都具有正确的运行时库设置( C / C ++选项卡,VS 6.0中的代码生成类别项目设置)?

And are you sure that all the components of the project have correct runtime library settings (C/C++ tab, Code Generation category in VS 6.0 project settings)?

不,我不是,明天我会花费几个小时通过工作区(58个项目在它),并检查他们正在编译和链接与适当的标志。



更新:这需要30秒。选择设置对话框中的所有项目,取消选择,直到找到没有正确设置的项目(他们都有正确的设置)。

No I'm not, and I'll spend a couple of hours tomorrow going through the workspace (58 projects in it) and checking they're all compiling and linking with the appropriate flags.


Update: This took 30 seconds. Select all projects in the Settings dialog, unselect until you find the project(s) that don't have the right settings (they all had the right settings).

推荐答案

我的首选将是一个专用的堆工具,如 pageheap.exe

My first choice would be a dedicated heap tool such as pageheap.exe.

重写新的和删除可能是有用的,但这不会捕获低级代码提交的allocs。如果这是你想要的,最好使用Microsoft Detours Detour 低级别分配API

Rewriting new and delete might be useful, but that doesn't catch the allocs committed by lower-level code. If this is what you want, better to Detour the low-level alloc APIs using Microsoft Detours.

另外理性检查,如:验证您的运行时库是否匹配(发布与调试,多线程与单线程,dll与静态库),寻找坏的删除(例如,delete where delete []应该被使用),请确保您不会混合并匹配您的资源。

Also sanity checks such as: verify your run-time libraries match (release vs. debug, multi-threaded vs. single-threaded, dll vs. static lib), look for bad deletes (eg, delete where delete [] should have been used), make sure you're not mixing and matching your allocs.

还可以选择性地关闭线程,并查看何时/如果问题消失。

Also try selectively turning off threads and see when/if the problem goes away.

在第一次异常时,调用堆栈等是什么样的?

What does the call stack etc look like at the time of the first exception?

这篇关于Win32下的堆腐败如何找到?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆