当将信号量递减为零的进程崩溃时,如何恢复信号量? [英] How do I recover a semaphore when the process that decremented it to zero crashes?

查看:25
本文介绍了当将信号量递减为零的进程崩溃时,如何恢复信号量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个用 g++ 编译的应用程序,在 Ubuntu 中运行.我正在使用命名信号量来协调不同进程.

I have multiple apps compiled with g++, running in Ubuntu. I'm using named semaphores to co-ordinate between different processes.

一切正常除了在以下情况下:如果其中一个进程调用 sem_wait()sem_timedwait() 来递减信号量然后在有机会调用 sem_post() 之前崩溃或被杀死 -9,然后从那一刻起,命名信号量不可用".

All works fine except in the following situation: If one of the processes calls sem_wait() or sem_timedwait() to decrement the semaphore and then crashes or is killed -9 before it gets a chance to call sem_post(), then from that moment on, the named semaphore is "unusable".

所谓不可用",我的意思是信号量计数现在为零,并且应该将其增加回 1 的进程已经死亡或被杀死.

By "unusable", what I mean is the semaphore count is now zero, and the process that should have incremented it back to 1 has died or been killed.

我找不到一个 sem_*() API,它可能会告诉我最后一个递减它的进程崩溃了.

I cannot find a sem_*() API that might tell me the process that last decremented it has crashed.

我是否缺少某个 API?

Am I missing an API somewhere?

这是我打开命名信号量的方法:

Here is how I open the named semaphore:

sem_t *sem = sem_open( "/testing",
    O_CREAT     |   // create the semaphore if it does not already exist
    O_CLOEXEC   ,   // close on execute
    S_IRWXU     |   // permissions:  user
    S_IRWXG     |   // permissions:  group
    S_IRWXO     ,   // permissions:  other
    1           );  // initial value of the semaphore

这是我递减它的方法:

struct timespec timeout = { 0, 0 };
clock_gettime( CLOCK_REALTIME, &timeout );
timeout.tv_sec += 5;

if ( sem_timedwait( sem, &timeout ) )
{
    throw "timeout while waiting for semaphore";
}

推荐答案

事实证明没有办法可靠地恢复信号量.当然,任何人都可以 post_sem() 到指定的信号量以使计数再次增加到零以上,但是如何判断何时需要这样的恢复?提供的 API 太有限,并没有以任何方式指示何时发生这种情况.

Turns out there isn't a way to reliably recover the semaphore. Sure, anyone can post_sem() to the named semaphore to get the count to increase past zero again, but how to tell when such a recovery is needed? The API provided is too limited and doesn't indicate in any way when this has happened.

注意也可用的 ipc 工具——常用工具 ipcmkipcrmipcs 仅用于过时的 SysV 信号量.它们特别不适用于新的 POSIX 信号量.

Beware of the ipc tools also available -- the common tools ipcmk, ipcrm, and ipcs are only for the outdated SysV semaphores. They specifically do not work with the new POSIX semaphores.

但看起来还有其他东西可用于锁定事物,当应用程序以信号处理程序无法捕获的方式死亡时,操作系统会自动释放这些东西.两个示例:绑定到特定端口的侦听套接字,或对特定文件的锁定.

But it looks like there are other things that can be used to lock things, which the operating system does automatically release when an application dies in a way that cannot be caught in a signal handler. Two examples: a listening socket bound to a particular port, or a lock on a specific file.

我决定锁定文件是我需要的解决方案.所以代替 sem_wait()sem_post() 调用,我使用:

I decided the lock on a file is the solution I needed. So instead of a sem_wait() and sem_post() call, I'm using:

lockf( fd, F_LOCK, 0 )

lockf( fd, F_ULOCK, 0 )

当应用程序以任何方式退出时,文件会自动关闭,同时释放文件锁.然后,等待信号量"的其他客户端应用程序可以按预期自由进行.

When the application exits in any way, the file is automatically closed which also releases the file lock. Other client apps waiting for the "semaphore" are then free to proceed as expected.

感谢您的帮助,伙计们.

Thanks for the help, guys.

这篇关于当将信号量递减为零的进程崩溃时,如何恢复信号量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆