CUDA / OpenCL中的现实死锁示例 [英] Realistic deadlock example in CUDA/OpenCL

查看：240 发布时间：2017/3/4 14:57:52 synchronization cuda parallel-processing opencl simd

本文介绍了CUDA / OpenCL中的现实死锁示例的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于我写的教程，我正在寻找一个由SIMT / SIMD无知引起的死锁的现实和简单示例。

For a tutorial I'm writing, I'm looking for a "realistic" and simple example of a deadlock caused by ignorance of SIMT / SIMD.

任何输入都会被感谢。

…
int x = threadID / 2;
if (threadID > x) {
    value[threadID] = 42;
    barrier();
    }
else {
    value2[threadID/2] = 13
    barrier();
}
result = value[threadID/2] + value2[threadID/2];

我知道，它不是正确的CUDA C或OpenCL C。 / p>

I know, it is neither proper CUDA C nor OpenCL C.

推荐答案

一个简单的死锁实际上很容易被新手CUDA程序员捕获是当一个人尝试实现一个临界区线程，最终应该由所有线程执行。它或多或少像这样：

A simple deadlock that is actually easy to catch by the novice CUDA programmer is when one tries to implement a critical section for a single thread, that should ultimately be performed by all threads. It goes more-or-less like this:

__global__ kernel() {
  __shared__ int semaphore;
  semaphore=0;
  __syncthreads();
  while (true) {
    int prev=atomicCAS(&semaphore,0,1);
    if (prev==0) {
      //critical section
      semaphore=0;
      break;
    }
  }
}

atomicCAS 指令确保exaclty一个线程得到0分配给prev，而所有其他线程得到1.当一个线程完成其临界区时，它将信号量设置为0，以使其他线程有机会

The atomicCAS instruction ensures that exaclty one thread gets 0 assigned to prev, while all others get 1. When that one thread finishes its critical section, it sets the semaphore back to 0 so that other threads have a chance to enter the critical section.

问题是，当1个线程获得prev = 0时，属于同一SIMD单元的31个线程得到一个值1.在如果语句CUDA调度器放置该单线程保持（屏蔽它），并让其他31线程继续他们的工作。在正常情况下，这是一个好的策略，但在这种特殊情况下，你最终得到1个关键部分的线程永远不会执行，31线程等待无限。死锁

The problem is, that while 1 thread gets prev=0, 31 threads, belonging to the same SIMD unit get a value 1. At the if-statement CUDA scheduler puts that single thread on-hold (masks it out) and let other 31-threads continue their work. In normal circumstances it is a good strategy, but in this particular case you end up with 1 critical-section thread that is never executed and 31 threads waiting for infinity. Deadlock.

还要注意的是， break 的存在导致控制流在 while 循环。如果你省略了break指令，并且在if块之后有更多的操作应该由所有线程执行，那么它实际上可以帮助调度程序避免死锁。

Also note, the existence of break which leads the control flow outside of the while loop. If you ommit the break instruction and have some more operations after the if-block that are supposed to be executed by all threads, it may actually help the scheduler avoid the deadlock.

关于您在问题中给出的示例：在CUDA中，明确禁止在SIMD分支代码中放置 __ syncthreads（）。编译器不会捕获它，但手册说未定义的行为。实际上，在pre-Fermi设备上，所有 __ syncthreads（）被视为相同的障碍。有了这个假设，你的代码实际上会终止而没有错误。虽然不应该依赖这种行为。

Regarding your example given in the question: in CUDA it is explicitly forbidden to put __syncthreads() in a SIMD-diverging code. The compiler won't catch it but the manual says about "undefined behaviour". In practice, on pre-Fermi devices, all __syncthreads() are seen as the same barriers. With that assumtion, your code would actually terminate without an error. One should not rely on this behaviour though.

这篇关于CUDA / OpenCL中的现实死锁示例的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

CUDA / OpenCL中的现实死锁示例 [英] Realistic deadlock example in CUDA/OpenCL

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

CUDA / OpenCL中的现实死锁示例 [英] Realistic deadlock example in CUDA/OpenCL

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭