能否正确的故障安全进程共享的屏障在Linux上执行? [英] Can a correct fail-safe process-shared barrier be implemented on Linux?

查看:207
本文介绍了能否正确的故障安全进程共享的屏障在Linux上执行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在过去的问题,我问了一下pthread的实施没有障碍赛的破坏:

In a past question, I asked about implementing pthread barriers without destruction races:

<一个href=\"http://stackoverflow.com/questions/5886614/how-can-barriers-be-destroyable-as-soon-as-pthread-barrier-wait-returns\">How可以壁垒是销毁的,只要pthread_barrier_wait回报?

和迈克尔伯尔收到与工艺本地壁垒完美的解决方案,但失败进程共享的障碍。后来我们通过一些思想工作,但从来没有达到一个圆满的句号,甚至没有开始进入资源失败的案例。

and received from Michael Burr with a perfect solution for process-local barriers, but which fails for process-shared barriers. We later worked through some ideas, but never reached a satisfactory conclusion, and didn't even begin to get into resource failure cases.

是否可以在Linux上做出符合这些条件的障碍:

Is it possible on Linux to make a barrier that meets these conditions:


  • 进程共享的(可以在任何共享内存中创建)。

  • 安全取消映射或屏障等待函数返回后立即销毁从任何线程的障碍。

  • 不能不由于资源分配失败。

迈克尔在解决进程共享的情况下(见链接的问题)尝试有一些系统资源必须等待时间来分配,这意味着等待可能会失败不幸的财产。而且目前还不清楚是什么时候的障碍等待失败,主叫方可以合理地做,由于阻挡整点是,它是不安全的进行,直到剩余 N-1 线程到达它...

Michael's attempt at solving the process-shared case (see the linked question) has the unfortunate property that some kind of system resource must be allocated at wait time, meaning the wait can fail. And it's unclear what a caller could reasonably do when a barrier wait fails, since the whole point of the barrier is that it's unsafe to proceed until the remaining N-1 threads have reached it...

一个内核空间的解决方案可能是唯一的出路,但即使这是困难的,因为中断等待,没有可靠的方法信号的可能性,以恢复它...

A kernel-space solution might be the only way, but even that's difficult due to the possibility of a signal interrupting the wait with no reliable way to resume it...

推荐答案

与bdonlan上SO聊天经过长时间的讨论,我想我有一个解决方案。基本上,我们打破了问题分解成两个自同步释放的问题:销毁的操作和取消映射

After a long discussion with bdonlan on SO chat, I think I have a solution. Basically, we break the problem down into the two self-synchronized deallocation issues: the destroy operation and unmapping.

销毁处理非常容易:只需让 pthread_barrier_destroy 函数等待所有的服务员停止检查屏障。这可以通过具有在阻挡一个使用计数来完成,以原子递增/递减上入口/出口向等待功能,并具有破坏功能自旋等待计数到达零。 (它也可以在这里使用一个futex的,而不仅仅是纺纱,如果你在使用计数或类似的高位粘服务员标志。)

Handling destruction is easy: Simply make the pthread_barrier_destroy function wait for all waiters to stop inspecting the barrier. This can be done by having a usage count in the barrier, atomically incremented/decremented on entry/exit to the wait function, and having the destroy function spin waiting for the count to reach zero. (It's also possible to use a futex here, rather than just spinning, if you stick a waiter flag in the high bit of the usage count or similar.)

处理取消映射也很容易,但非本地:确保则munmap MMAP 与<$ C而屏障服务员都在退出的过程中,通过增加锁定系统调用包装不能发生$ C> MAP_FIXED 标记。这就需要一个专门的排序读写锁。最后服务员到达层应抢在则munmap RW​​-锁的读锁,这将在最终的服务员退出(递减的用户计数结果时公布0计数)。 则munmap MMAP 可进行折返(因为有些程序可能会想到,即使POSIX并不需要它)使得作家锁递归。事实上,一种锁,读者和作家是完全对称的,而每个类型的锁排除相反类型的锁,但不是同一类型,可以很好的工作。

Handling unmapping is also easy, but non-local: ensure that munmap or mmap with the MAP_FIXED flag cannot occur while barrier waiters are in the process of exiting, by adding locking to the syscall wrappers. This requires a specialized sort of reader-writer lock. The last waiter to reach the barrier should grab a read lock on the munmap rw-lock, which will be released when the final waiter exits (when decrementing the user count results in a count of 0). munmap and mmap can be made reentrant (as some programs might expect, even though POSIX doesn't require it) by making the writer lock recursive. Actually, a sort of lock where readers and writers are entirely symmetric, and each type of lock excludes the opposite type of lock but not the same type, should work best.

这篇关于能否正确的故障安全进程共享的屏障在Linux上执行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆