当我的应用程序锁定在客户计算机上时崩溃报告监视程序 [英] Crash reporting watchdog for when my application locks up on a customer's machine

查看:109
本文介绍了当我的应用程序锁定在客户计算机上时崩溃报告监视程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一个不太可靠的(Qt / windows)应用程序,该应用程序部分是由第三方为我们编写的(只是试图将责任推到那儿)。他们的最新版本更加稳定。有点。我们收到的崩溃报告较少,但是我们却收到很多有关崩溃的报告,而且这些报告永远不会回来。环境千差万别,我们只能收集很少的信息,所以我们无法重现这些问题。

I'm working with a somewhat unreliable (Qt/windows) application partly written for us by a third party (just trying to shift the blame there). Their latest version is more stable. Sort of. We're getting fewer reports of crashes, but we're getting lots of reports of it just hanging and never coming back. The circumstances are varied, and with the little information we can gather, we haven't been able to reproduce the problems.

因此,理想情况下,我想创建一些一种看门狗,它注意到应用程序已锁定,并提出将崩溃报告发送回给我们。好主意,但有问题:

So ideally, I'd like to create some sort of watchdog which notices that the application has locked up, and offers to send a crash report back to us. Nice idea, but there are problems:


  • 看门狗如何知道进程已挂起?大概我们检测到应用程序会定期向看门狗说一切正常,但我们在哪里放置它以确保它经常发生,但不太可能出现在应用程序终止时的代码路径上

  • How does the watchdog know the process has hung? Presumably we instrument the application to periodically say "all ok" to the watchdog, but where do we put that such that it's guarenteed to happen frequently enough, but isn't likely to be on a code path that the app ends up on when it's locked.

当崩溃发生时,看门狗应报告哪些信息? Windows有一个不错的调试API,因此我有信心可以访问所有有趣的数据,但是我不确定什么对跟踪问题有用。

What information should the watchdog report when a crash happens? Windows has a decent debug api, so I'm confident that all the interesting data is accessible, but I'm not sure what would be useful for tracking down the problems.

推荐答案

您想要一个小型转储的组合(如果您不想添加自己的小型转储,请使用DrWatson创建它们

You want a combination of a minidump (use DrWatson to create these if you don't want to add your own mini-dump generation code) and userdump to trigger a minidump creation on a hang.

自动检测到挂起的原因在于,很难确定何时挂起什么东西,何时缓慢挂起或挂起。被IO等待阻塞。我个人更喜欢允许用户在认为应用已挂起时故意使其崩溃。除了简单得多(我的应用程序很少挂起,如果可能的话:)),它还可以帮助它们成为解决方案的一部分。他们喜欢这样。

The thing about automatically detecting a hang is that its difficult to decide when somethings hung and when its just slow or blocked by IO wait. I personally prefer to allow the user to crash the app deliberately when they think its hung. Apart from being a lot easier (my apps don't tend to hang often, if at all :) ), it also helps them to "be part of the solution". They like that.

首先,查看经典的 bugslayer文章有关崩溃转储和符号的内容,也提供了一些有关这些问题的详细信息。

Firstly, check out the classic bugslayer article concerning crashdumps and symbols, which also has some excellent information regarding what's going on with these things.

第二,得到用户转储允许您创建转储,并说明将其设置为生成转储

Second, get userdump which allows you to create the dumps, and instructions for setting it up to generate dumps

有了转储后,在WinDBG中将其打开,便可以检查整个程序状态-包括线程和调用堆栈,寄存器,内存和函数参数。我想您会对使用 〜* kp 命令在Windbg中获取每个线程的调用堆栈,而!locks命令则显示所有锁定对象。我想您会发现,挂起是由于同步对象的死锁,由于所有线程都倾向于等待WaitForSingleObject调用,因此很难跟踪,但是在调用栈的下方进一步查看应用程序线程(而是而不是诸如后台通知和网络例程之类的框架线程。缩小范围后,您可以查看正在执行的调用,可以在应用程序中添加一些日志记录工具,以尝试为您提供更多信息,以备下次失败时使用。

When you have the dump, open it in WinDBG, and you will be able to inspect the entire program state - including threads and callstacks, registers, memory and parameters to functions. I think you'll be particularly interested in using the "~*kp" command in Windbg to get the callstack of every thread, and the "!locks" command to show all locking objects. I think you'll find that the hang will be due to a deadlock of synchronisation objects, which will be difficult to track down as all threads tend to wait on a WaitForSingleObject call, but look further down the callstacks to see the application threads (rather than 'framework' threads like background notifications and network routines). Once you've narrowed them down, you can see what calls were being made, possibly add some logging instrumentation to the app to try and give you more information ready for the next time it fails.

祝你好运。

Ps。快速的Google提醒了我这一点:调试死锁。 (CDB相当于windbg的命令行)

Ps. Quick google reminded me of this: Debugging deadlocks. (CDB is the command line equivalent of windbg)

这篇关于当我的应用程序锁定在客户计算机上时崩溃报告监视程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆