应用程序崩溃,没有解释 [英] Application crash with no explanation

查看:139
本文介绍了应用程序崩溃,没有解释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想提前道歉,因为这不是一个很好的问题。

I'd like to apologize in advance, because this is not a very good question.

我有一个服务器应用程序作为服务运行在专用的Windows服务器。非常随机,此应用程序崩溃,并且没有提示什么导致崩溃。

I have a server application that runs as a service on a dedicated Windows server. Very randomly, this application crashes and leaves no hint as to what caused the crash.

当崩溃时,事件日志有一个条目,说明应用程序失败,但给出没有线索为什么。它还提供了关于故障模块的一些信息,但它似乎不是很可靠,因为故障模块通常在每次故障时是不同的。例如,最新的说是ntdll,那之前说的是libmysql,那之前说的是netsomething,等等。

When it crashes, the event logs have an entry stating that the application failed, but gives no clue as to why. It also gives some information on the faulting module, but it doesn't seem very reliable, as the faulting module is usually different on each crash. For example, the latest said it was ntdll, the one before that said it was libmysql, the one before that said it was netsomething, and so on.

每个单线程在应用程序中被包装在 try / catch(...)(从异常处理程序抛出的异常或没有被捕获), __ try / __ except (结构化异常)和 try / catch (特定C ++异常)。应用程序使用/ EHa编译,所以catch all也将捕获结构化异常。

Every single thread in the application is wrapped in a try/catch (...) (anything thrown from an exception handler/not specifically caught), __try/__except (structured exceptions), and try/catch (specific C++ exceptions). The application is compiled with /EHa, so the catch all will also catch structured exceptions.

所有这些异常处理程序都做同样的事情。首先,创建崩溃转储。其次,将条目记录到磁盘上的新文件。第三,在应用程序日志中记录条目。在发生这些崩溃的情况下,没有发生这种情况。最底层的异常处理程序( try / catch(...))什么都不做,它只是终止线程。主应用程序线程处于睡眠状态,没有抛出异常的机会。

All of these exception handlers do the same thing. First, a crash dump is created. Second, an entry is logged to a new file on disk. Third, an entry is logged in the application logs. In the case of these crashes, none of this is happening. The bottom most exception handler (the try/catch (...)) does nothing, it just terminates the thread. The main application thread is asleep and has no chance of throwing an exception.

应用程序日志文件只是停止日志记录。不久之后,监视服务器的进程会发现它不再响应,发送警报,然后再次启动它。如果服务器监视器注意到服务器仍在运行,但只是没有响应,它需要转储的过程,并报告这一点,但这不会发生。

The application log files just stop logging. Shortly after, the process that monitors the server notices that it's no longer responding, sends an alert, and starts it again. If the server monitor notices that the server is still running, but just not responding, it takes a dump of the process and reports this, but this isn't happening.

除了未捕获的异常之外,这种行为的唯一其他原因是调用 exit 或类似的。搜索代码不会调用任何可能终止进程的函数。我也已经确保程序没有正常终止(即从服务管理器的停止请求)。

The only other reason for this behavior that I can come up with, aside from uncaught exceptions, is a call to exit or similar. Searching the code brings up no calls to any functions that could terminate the process. I've also made sure that the program isn't terminating normally (i.e. a stop request from the service manager).

我们已经尝试运行它with windbg连接机会使用Visual Studio,开销太高),但它没有报告任何时候发生崩溃。

We have tried running it with windbg attached (no chance to use Visual Studio, the overhead is too high), but it didn't report anything when the crash occurred.

什么可以导致应用程序崩溃这样?

What can cause an application to crash like this? We're beginning to run out of options and consider that it might be a hardware failure, but that seems a bit unlikely to me.

推荐答案

我们已经开始耗尽所有选项,并认为它可能是硬件故障,

如果您的应用程序正在蒸发而不生成转储文件,那么很可能会生成您的应用程序无法处理的异常。这可能发生在两个实例中:

If your app is evaporating an not generating a dump file, then it is likely that an exception is being generated which your app doesnt (or cant) handle. This could happen in two instances:

1)生成顶级异常,并且没有匹配的 catch

1) A top-level exception is generated and there is no matching catch block for that exception type.

2)你有一个匹配的 catch 块(例如 catch(...)),但是你在该处理程序中生成了一个异常。当这种情况发生时,Windows将撕裂您的程序的骨骼。您的应用程式将完全停止存在。不会产生转储,并且几乎不会记录任何内容。这是Windows最后一道努力,以防止流氓程序关闭整个系统。

2) You have a matching catch block (such as catch(...)), but you are generating an exception within that handler. When this happens, Windows will rip the bones from your program. Your app will simply cease to exist. No dump will be generated, and virtually nothing will be logged, This is Windows' last-ditch effort to keep a rogue program from taking down the entire system.

注释约 catch(...)。这显然是邪恶。在生产代码中应该(几乎)不会有 catch(...)。写入 catch(...)的人通常认为有两件事情之一:

A note about catch(...). This is patently Evil. There should (almost) never be a catch(...) in production code. People who write catch(...) generally argue one of two things:

我的程序永远不会崩溃如果发生任何事情,我想从异常恢复并继续运行这是一个服务器应用程序!ZOMG!

"My program should never crash. If anything happens, I want to recover from the exception and continue running. This is a server application! ZOMG!"

- 或 -

我的程序可能会崩溃,但如果是我想在下降时创建一个转储文件。

"My program might crash, but if it does I want to create a dump file on the way down."

天真和危险的态度,因为如果你尝试处理和恢复从每一个异常,你会做一些坏的你的经营足迹。也许你会打乱堆,保持资源打开,应该关闭,创建死锁或竞争条件,谁知道。你的程序最终会遭遇致命的崩溃。但是到那时,调用堆栈将不会引起与导致实际问题的相似,并且没有转储文件将永远帮助你。

The former is a naive and dangerous attitude because if you do try to handle and recover from every single exception, you are going to do something bad to your operating footprint. Maybe you'll munch the heap, keep resources open that should be closed, create deadlocks or race conditions, who knows. Your program will suffer from a fatal crash eventually. But by that time the call stack will bear no resemblance to what caused the actual problem, and no dump file will ever help you.

后者是一个贵族&稳健的方法,但它的执行它是更困难,它似乎,它充满危险。问题是你必须避免在异常处理程序中生成任何进一步的异常,并且你的机器已经处于一个非常不稳定的状态。通常完全安全的操作是突然手榴弹。 new delete ,任何CRT函数,字符串格式化,甚至基于堆栈的分配,如 char buf [256] 可以使您的应用程序> POOF<并走了。你必须假设堆栈和堆都躺在废墟中。没有分配是安全的。

The latter is a noble & robust approach, but the implementation of it is much more difficult that it might seem, and it fraught with peril. The problem is you have to avoid generating any further exceptions in your exception handler, and your machine is already in a very wobbly state. Operations which are normally perfectly safe are suddenly hand grenades. new, delete, any CRT functions, string formatting, even stack-based allocations as simple as char buf[256] could make your application go >POOF< and be gone. You have to assume the stack and the heap both lie in ruins. No allocation is safe.

此外,也有一些异常可能发生, catch 块根本无法捕获,例如SEH异常。因此,我总是写一个未处理的异常处理程序,并通过 SetUnhandledExceptionFilter 。在我的异常处理程序,我分配每个字节我需要通过静态分配,在程序甚至启动之前。在这个处理程序中最好的(最健壮的)事情是触发一个独立的应用程序启动,这将从您的应用程序外部生成一个MiniDump文件。但是,如果你非常小心,可以从处理程序本身生成MiniDump,不要直接或间接调用任何CRT函数。基本上,如果它不是你调用的API函数,它可能是不安全的。

Moreover, there are exceptions that can occur that a catch block simply can't catch, such as SEH exceptions. For that reason, I always write an unhandled-exception handler, and register it with Windows, via SetUnhandledExceptionFilter. Within my exception handler, I allocate every single byte I need via static allocation, before the program even starts up. The best (most robust) thing to do within this handler is to trigger a seperate application to start up, which will generate a MiniDump file from outside of your application. However, you can generate the MiniDump from within the handler itself if you are extremely careful no not call any CRT function directly or indirectly. Basically, if it isn't an API function you're calling, it probably isn't safe.

这篇关于应用程序崩溃,没有解释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆