如何在Linux应用程序中触发虚假唤醒? [英] How to trigger spurious wake-up within a Linux application?

查看:141
本文介绍了如何在Linux应用程序中触发虚假唤醒?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

某些背景:

我有一个依赖第三方硬件和封闭源驱动程序的应用程序.驱动程序当前存在一个错误,该错误会导致设备在随机的一段时间后停止响应.这是由驱动程序内部明显的死锁引起的,并中断了我的应用程序的正常运行,而该应用程序始终处于24/7的高度可见的环境中.

I have an application that relies on third party hardware and a closed source driver. The driver currently has a bug in it that causes the device to stop responding after a random period of time. This is caused by an apparent deadlock within the driver and interrupts proper functioning of my application, which is in an always-on 24/7 highly visible environment.

我发现,将GDB附加到进程,并立即从进程中分离GDB会导致设备恢复功能.这是我的第一个迹象,表明驱动程序本身存在线程锁定问题.有某种种族条件会导致僵局.显然,附加GDB会导致一些线程改组,并可能使它们退出等待状态,从而导致它们重新评估其条件,从而打破僵局.

What I have found is that attaching GDB to the process, and immediately detaching GDB from the process results in the device resuming functionality. This was my first indication that there was a thread locking issue within the driver itself. There is some kind of race condition that leads to a deadlock. Attaching GDB was obviously causing some reshuffling of threads and probably pushing them out of their wait state, causing them to re-evaluate their conditions and thus breaking the deadlock.

问题:

我的问题很简单:是否有干净的等待应用程序触发程序中的所有线程中断其等待状态?肯定有效的方法(至少在我的实现中)是发送SIGSTOP,然后立即发送来自另一个进程(即来自bash)的SIGCONT:

My question is simply this: is there a clean wait for an application to trigger all threads within the program to interrupt their wait state? One thing that definitely works (at least on my implementation) is to send a SIGSTOP followed immediately by a SIGCONT from another process (i.e. from bash):

kill -19 `cat /var/run/mypidfile` ; kill -18 `cat /var/run/mypidfile`

这会触发过程中的虚假唤醒,一切都会恢复原状.

This triggers a spurious wake-up within the process and everything comes back to life.

我希望有一种智能的方法来触发进程中所有线程的虚假唤醒.考虑pthread_cond_broadcast(...),但无法访问正在等待的实际条件变量.

I'm hoping there is an intelligent method to trigger a spurious wake-up of all threads within my process. Think pthread_cond_broadcast(...) but without having access to the actual condition variable being waited on.

这是可能的,还是我唯一的方法依赖于像kill这样的程序?

Is this possible, or is relying on a program like kill my only approach?

推荐答案

您现在的操作方式可能是最正确,最简单的方法.内核中没有唤醒给定进程中所有等待的futex"操作,这是您需要更直接地实现这一目标的方法.

The way you're doing it right now is probably the most correct and simplest. There is no "wake all waiting futexes in a given process" operation in the kernel, which is what you would need to achieve this more directly.

请注意,如果唤醒失败的死锁"位于pthread_cond_wait中,但通过信号中断将其打破死锁,则该bug不会出现在应用程序中;否则,该bug不会出现在应用程序中.它实际上必须在pthread条件变量中执行. glibc在其条件变量实现中已知未修复的错误;参见 http://sourceware.org/bugzilla/show_bug.cgi?id=13165以及相关的错误报告.但是,您可能已经找到了一个新的,因为我不认为可以通过打破带有信号的futex等待来解决现有的已知问题.如果您可以将此错误报告给glibc错误跟踪器,那将非常有帮助.

Note that if the failure-to-wake "deadlock" is in pthread_cond_wait but interrupting it with a signal breaks out of the deadlock, the bug cannot be in the application; it must actually be in the implementation of pthread condition variables. glibc has known unfixed bugs in its condition variable implementation; see http://sourceware.org/bugzilla/show_bug.cgi?id=13165 and related bug reports. However, you might have found a new one, since I don't think the existing known ones can be fixed by breaking out of the futex wait with a signal. If you can report this bug to the glibc bug tracker, it would be very helpful.

这篇关于如何在Linux应用程序中触发虚假唤醒?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆