究竟是什么导致Android中出现“暂停时旋转”错误? [英] What exactly causes a 'spin on suspend' error in Android?

查看:94
本文介绍了究竟是什么导致Android中出现“暂停时旋转”错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前在调试某些依赖于本机库的Android代码时遇到问题。特别是一个本地调用似乎容易出现此在暂停时旋转错误。通常显示如下:

I'm currently having trouble debugging some Android code which relies on a native library. One native call in particular seems prone to this "spin on suspend" error. It generally manifests like so:

threadid=2: spin on suspend #2 threadid=48 (pcf=3)

因此,到目前为止,我无法确切确定这里发生了什么,除了大约10条这些消息之后,我的应用程序遇到 SIGSTKFLT 并退出。每次,第一个线程是GC,第二个线程是当前正在执行本机代码的任何线程。随此消息一起打印的堆栈部分始终在堆栈顶部具有本机方法。

Thus far I haven't been able to determine exactly what's failing here, except that after about 10 of these messages, my application encounters a SIGSTKFLT and exits. Every time, the first thread is the GC, and the second thread is whatever thread is currently executing the native code. The portion of the stack printed along with this message always has a native method at the top of the stack.

当Dalvik对此抱怨时,究竟发生了什么?我开始调试原因,以便解决问题?

What exactly is happening when Dalvik complains about this, and how can I begin to debug the cause so I can fix it?

编辑:有趣的一面-在本机开发人员进行一些更改之后,我现在有时还会看到以下错误:

EDIT: An interesting wrinkle -- after the native developer made some changes, I now also see the following error sometimes:

PopFrame missed the break
VM aborting
Fatal signal 11 (SIGSEGV) at 0xdeadd00d (code=1)

对于我来说,线程转储显示我的状态也很奇怪堆栈顶部的本机方法,但线程状态为 RUNNABLE ,而不是 NATIVE -那怎么可能

It's also extremely odd to me that the thread dump shows my native method at the top of the stack, yet the thread state is RUNNABLE, not NATIVE -- how can that be possible?

推荐答案

基本问题是Dalvik是安全挂起的VM,并使用停止世界垃圾回收。这意味着,要使GC运行,它必须等待所有线程到达一个可以确保它们不会改变堆的点。

The basic problem is that Dalvik is a safe-point suspend VM, and uses "stop the world" garbage collection. This means that, for the GC to operate, it has to wait for all threads to reach a point where it can be sure that they won't be altering the heap.

由于某种原因,您的一个线程没有响应GC线程的挂起请求。它实际上不是在本机代码中执行;如果是这样,则该线程将处于 NATIVE 状态,这被认为是安全的。 (对本机堆的所有访问都通过JNI调用来控制,所有JNI调用都进行挂起检查。)

For some reason, one of your threads isn't responding to the GC thread's request to suspend. It's not actually executing in native code; if it were, the thread would be in NATIVE state, which is considered safe. (All access to the native heap is gated through JNI calls, and all JNI calls do a suspend check.)

出于性能方面的考虑,JIT能够链接跳过暂停检查的方式一起编译代码。如果线程花费太长时间来挂起,则挂起的线程将释放块,并等待更长的时间。最终,它开始抱怨,最终,它放弃并中止了VM。

For performance reasons, the JIT is capable of chaining blocks of compiled code together in a way that skips the suspend checks. If a thread takes too long to suspend, the suspending thread will "unchain" the blocks, and wait a little longer. Eventually it starts complaining, and eventually-eventually it gives up and aborts the VM.

某些设备使用供应商修改的Dalvik版本会出错,并且中止可能会在紧密循环中发生。在这种情况下,我不希望在堆栈顶部看到本机方法。

Some devices use a vendor-modified version of Dalvik that gets this wrong, and aborts can happen on tight loops. I wouldn't expect to see a native method at the top of the stack in this case.

您最好的调试方法是在不满意的地方附加gdb并尝试找出目标线程在做什么。本机代码可能以某种方式破坏了VM状态或返回堆栈,因此从本机代码返回时,线程被卡住了。

Your best bet for debugging is to attach gdb at the point it goes unhappy and try to figure out what the target thread is doing. It's possible that the native code trashed the VM state or return stack in some way, and so on its return from native code the thread gets jammed up.

更新编辑之后: dvmPopFrame()函数用于从托管堆栈弹出堆栈帧。当VM调用您的本机方法时,它会插入一个中断帧,以便在展开堆栈进行异常处理时,VM不会超出调用站点。 (它还用于VM发出的托管代码方法调用,例如用于反射或< clinit> 。)消息 PopFrame错过了休息时间表示未找到中断帧。

Update after The dvmPopFrame() function is used to pop a stack frame off the managed stack. When the VM calls your native method it inserts a "break" frame, so that when the stack is unrolled for exception handling the VM doesn't blow past the call site. (It's also used for managed-code method calls issued by the VM, e.g. for reflection or <clinit>.) The message PopFrame missed the break means that the break frame wasn't found.

中断帧的方法指针为空。展开堆栈时, dvmPopFrame()就继续。如果您到达堆栈的顶部,就错过了休息时间-所有Dalvik堆栈都从一个实数方法开始(如果线程是通过JNI连接到VM的,则有时是假方法)。

Break frames have a null method pointer. When unrolling the stack, dvmPopFrame() continues as long as it sees a non-null method pointer (meaning it's not a break frame) and a non-null previous-frame pointer (meaning you haven't hit the top of the stack). If you hit the top of the stack, you've missed the break -- all Dalvik stacks start with a real method (sometimes a "fake" method if the thread was attached to the VM with JNI).

所以我的猜测是本机代码破坏了堆栈,使前一帧指针无效。解决该问题的一种技术是让VM调用一个本机方法,然后再调用实际的本机方法。 中间人在堆栈上分配一些东西,将其设置为已知值,调用实际方法,然后在返回之前验证其堆栈分配是否不变。

So my guess would be that the native code trashed the stack, nulling the previous-frame pointer. One technique for sorting this out would be to have the VM call a native method that calls the actual native method; the "middle man" allocates some stuff on the stack, sets it to known values, calls the actual method, then verifies that its stack allocations are unchanged before returning.

(可能需要使用值,以防止编译器对其进行优化;如果使用类似以下内容的话:

(It may be necessary to use the values to prevent the compiler from optimizing them away; if you use something like:

if (jniEnv == NULL) {
    printf("my stuff is ...", ...);
}

那么它将永远不会真正运行,因为 JNIEnv * 永远不会为空...但是编译器

then it'll never actually run, since the JNIEnv* is never null... but the compiler doesn't know that.)

有关Dalvik堆栈布局的完整说明,请参见 dalvik / vm / interp / Stack.h

For a full description of the Dalvik stack layout, see dalvik/vm/interp/Stack.h.

从本地代码返回时,线程通常位于 RUNNABLE 中。您的本机方法仍然位于顶部,因为弹出它的代码失败并中止了VM。

It's normal for the thread to be in RUNNABLE when returning from native code. Your native method is still at the top because the code that pops it off failed and aborted the VM.

这篇关于究竟是什么导致Android中出现“暂停时旋转”错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆