什么使“不可能的”堆栈跟踪崩溃后? [英] What to make of an "impossible" stack trace after a crash?

查看:198
本文介绍了什么使“不可能的”堆栈跟踪崩溃后?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的程序似乎遇到了一个非常难以再现的错误:一旦在一个蓝色的月亮,当用户把他的Mac睡觉,以后再次唤醒它,我的程序的子进程之一将崩溃

My program appears to be suffering from a very hard-to-reproduce bug: Once in a blue moon, when a user puts his Mac to sleep and later on wakes it back up again, one of my program's child processes will crash immediately after the Mac wakes up.

当发生这种情况时,Apple的崩溃记录机制可靠地报告类似这样的堆栈跟踪:

When this happens Apple's crash-reporter mechanism reliably reports a stack trace like this one:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib          0x967f9a6a __pthread_kill + 10
1   libsystem_c.dylib               0x9003dacf pthread_kill + 101
2   libsystem_c.dylib               0x900744f8 abort + 168
3   com.meyersound.VirtualD-Mitri   0x0014438e muscle::CrashSignalHandler(int) + 190
4   libsystem_c.dylib               0x9002886b _sigtramp + 43
5   ???                             0xffffffff 0 + 4294967295
6   com.meyersound.VirtualD-Mitri   0x001442d0 muscle::ParsePortArg(muscle::Message const&, muscle::String const&, unsigned short&, unsigned long) + 80
7   com.meyersound.VirtualD-Mitri   0x005b3393 qnet::RepDBPeer::Pulse(muscle::PulseNode::PulseArgs const&) + 1187
8   com.meyersound.VirtualD-Mitri   0x0015717b muscle::PulseNode::PulseAux(unsigned long long) + 203
9   com.meyersound.VirtualD-Mitri   0x000cfb90 muscle::ReflectServer::ServerProcessLoop() + 3232
10  com.meyersound.VirtualD-Mitri   0x00607c7e dcasldmain(int, char**) + 2222
11  com.meyersound.VirtualD-Mitri   0x0072c14d dmitridmain(int, char**) + 4749
12  com.meyersound.VirtualD-Mitri   0x0000bc3a main + 4938
13  com.meyersound.VirtualD-Mitri   0x000061ab _start + 209
14  com.meyersound.VirtualD-Mitri   0x000060d9 start + 41

这一切都很好,除了(cue eerie音乐) - 这是逻辑上不可能的。特别是,我的 RepDBPeer :: Pulse()方法不会调用 ParsePortArg code> ParsePortArg 任何地方! (我grepp所有的源代码两次,以确保)

This is all well and good, except (cue eerie music) -- it's logically impossible. In particular, not only does my RepDBPeer::Pulse() method never call ParsePortArg, the crashing process never calls ParsePortArg anywhere! (I grepped all of my source code twice to make sure)

所以我的问题是,这是堆栈跟踪试图告诉我什么?这是最可能的情况下,线程0的堆栈被损坏严重到足以使堆栈跟踪机制已经走了轨道,并指出一个无辜的旁观者作为罪魁祸首?还是有可能苹果的唤醒机制以某种方式跳跃程序计数器到ParsePortArg()(由此导致的混乱导致崩溃)?还是有其他更深的魔法在这里,我甚至不能想象?

So my question is, just what is this stack trace trying to tell me? Is this most likely a case of Thread 0's stack getting corrupted badly enough that the stack-trace mechanism has gone off the rails and fingered an innocent bystander as the culprit? Or is it possible that Apple's wakeup mechanism somehow "jumped" the program counter into ParsePortArg() (whereupon the resulting confusion caused the crash)? Or is there some other deeper magic going on here that I can't even imagine?

崩溃的进程是一个香草非GUI背景过程,是一个孩子

The crashing process in question is a vanilla non-GUI background process that is a child process spawned by a Qt GUI process, for what that's worth.

推荐答案

我想假设你有一定数量的优化打开。堆栈跟踪没有魔法。一旦代码内联或省略,它们变得越来越模糊(读为不太准确),这正是C ++优化器所做的。
ParsePortArg 的情况下,在该行的末尾有一个+80,意味着代码段中该函数的入口点之前的80个字节。这表示 0x001442d0 处的指令指针的真实地址, ParsePortArg 是堆栈转储猜测的最近的符号。

I'm going to assume you have some amount of optimization turned on. There's no magic to stack traces. They become increasingly more fuzzy (read as "less accurate") once code is inlined or omitted, which is precisely what a C++ optimizer does. In the case of ParsePortArg, there is a +80 at the end of that line, meaning 80 bytes ahead of the entry point of that function in the code segment. This indicates the true address of the instruction pointer at 0x001442d0, and ParsePortArg is the nearest symbol that the stack dump guessed at. You were right to assume it was a red herring.

除了任何其他数据,我会非常保守地猜测你的程序期望一些指针保持有效,从睡眠唤醒时无效。看看该地址处的指令的反汇编。我打赌一个内存地址正在尝试读取。

Barring any other data, I would make the very conservative guess that your program is expecting some pointer to remain valid that is not valid on wakeup from sleep. Look at the disassembly for the instruction at that address. I bet a memory address is trying to be read.

这篇关于什么使“不可能的”堆栈跟踪崩溃后?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆