CUDA / OpenCL中的现实死锁示例 [英] Realistic deadlock example in CUDA/OpenCL
问题描述
对于我写的教程,我正在寻找一个由SIMT / SIMD无知引起的死锁的现实和简单示例。
For a tutorial I'm writing, I'm looking for a "realistic" and simple example of a deadlock caused by ignorance of SIMT / SIMD.
任何输入都会被感谢。
…
int x = threadID / 2;
if (threadID > x) {
value[threadID] = 42;
barrier();
}
else {
value2[threadID/2] = 13
barrier();
}
result = value[threadID/2] + value2[threadID/2];
我知道,它不是正确的CUDA C或OpenCL C。 / p>
I know, it is neither proper CUDA C nor OpenCL C.
推荐答案
一个简单的死锁实际上很容易被新手CUDA程序员捕获是当一个人尝试实现一个临界区线程,最终应该由所有线程执行。它或多或少像这样:
A simple deadlock that is actually easy to catch by the novice CUDA programmer is when one tries to implement a critical section for a single thread, that should ultimately be performed by all threads. It goes more-or-less like this:
__global__ kernel() {
__shared__ int semaphore;
semaphore=0;
__syncthreads();
while (true) {
int prev=atomicCAS(&semaphore,0,1);
if (prev==0) {
//critical section
semaphore=0;
break;
}
}
}
atomicCAS
指令确保exaclty一个线程得到0分配给prev,而所有其他线程得到1.当一个线程完成其临界区时,它将信号量设置为0,以使其他线程有机会
The atomicCAS
instruction ensures that exaclty one thread gets 0 assigned to prev, while all others get 1. When that one thread finishes its critical section, it sets the semaphore back to 0 so that other threads have a chance to enter the critical section.
问题是,当1个线程获得prev = 0时,属于同一SIMD单元的31个线程得到一个值1.在如果语句CUDA调度器放置该单线程保持(屏蔽它),并让其他31线程继续他们的工作。在正常情况下,这是一个好的策略,但在这种特殊情况下,你最终得到1个关键部分的线程永远不会执行,31线程等待无限。死锁
The problem is, that while 1 thread gets prev=0, 31 threads, belonging to the same SIMD unit get a value 1. At the if-statement CUDA scheduler puts that single thread on-hold (masks it out) and let other 31-threads continue their work. In normal circumstances it is a good strategy, but in this particular case you end up with 1 critical-section thread that is never executed and 31 threads waiting for infinity. Deadlock.
还要注意的是, break
的存在导致控制流在 while
循环。如果你省略了break指令,并且在if块之后有更多的操作应该由所有线程执行,那么它实际上可以帮助调度程序避免死锁。
Also note, the existence of break
which leads the control flow outside of the while
loop. If you ommit the break instruction and have some more operations after the if-block that are supposed to be executed by all threads, it may actually help the scheduler avoid the deadlock.
关于您在问题中给出的示例:在CUDA中,明确禁止在SIMD分支代码中放置 __ syncthreads()
。编译器不会捕获它,但手册说未定义的行为。实际上,在pre-Fermi设备上,所有 __ syncthreads()
被视为相同的障碍。有了这个假设,你的代码实际上会终止而没有错误。虽然 不应该依赖这种行为。
Regarding your example given in the question: in CUDA it is explicitly forbidden to put __syncthreads()
in a SIMD-diverging code. The compiler won't catch it but the manual says about "undefined behaviour". In practice, on pre-Fermi devices, all __syncthreads()
are seen as the same barriers. With that assumtion, your code would actually terminate without an error. One should not rely on this behaviour though.
这篇关于CUDA / OpenCL中的现实死锁示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!