CUDA / OpenCL中的现实死锁示例 [英] Realistic deadlock example in CUDA/OpenCL

查看:240
本文介绍了CUDA / OpenCL中的现实死锁示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于我写的教程,我正在寻找一个由SIMT / SIMD无知引起的死锁的现实和简单示例。

For a tutorial I'm writing, I'm looking for a "realistic" and simple example of a deadlock caused by ignorance of SIMT / SIMD.

任何输入都会被感谢。

…
int x = threadID / 2;
if (threadID > x) {
    value[threadID] = 42;
    barrier();
    }
else {
    value2[threadID/2] = 13
    barrier();
}
result = value[threadID/2] + value2[threadID/2];

我知道,它不是正确的CUDA C或OpenCL C。 / p>

I know, it is neither proper CUDA C nor OpenCL C.

推荐答案

一个简单的死锁实际上很容易被新手CUDA程序员捕获是当一个人尝试实现一个临界区线程,最终应该由所有线程执行。它或多或少像这样:

A simple deadlock that is actually easy to catch by the novice CUDA programmer is when one tries to implement a critical section for a single thread, that should ultimately be performed by all threads. It goes more-or-less like this:

__global__ kernel() {
  __shared__ int semaphore;
  semaphore=0;
  __syncthreads();
  while (true) {
    int prev=atomicCAS(&semaphore,0,1);
    if (prev==0) {
      //critical section
      semaphore=0;
      break;
    }
  }
}

atomicCAS 指令确保exaclty一个线程得到0分配给prev,而所有其他线程得到1.当一个线程完成其临界区时,它将信号量设置为0,以使其他线程有机会

The atomicCAS instruction ensures that exaclty one thread gets 0 assigned to prev, while all others get 1. When that one thread finishes its critical section, it sets the semaphore back to 0 so that other threads have a chance to enter the critical section.

问题是,当1个线程获得prev = 0时,属于同一SIMD单元的31个线程得到一个值1.在如果语句CUDA调度器放置该单线程保持(屏蔽它),并让其他31线程继续他们的工作。在正常情况下,这是一个好的策略,但在这种特殊情况下,你最终得到1个关键部分的线程永远不会执行,31线程等待无限。死锁

The problem is, that while 1 thread gets prev=0, 31 threads, belonging to the same SIMD unit get a value 1. At the if-statement CUDA scheduler puts that single thread on-hold (masks it out) and let other 31-threads continue their work. In normal circumstances it is a good strategy, but in this particular case you end up with 1 critical-section thread that is never executed and 31 threads waiting for infinity. Deadlock.

还要注意的是, break 的存在导致控制流在 while 循环。如果你省略了break指令,并且在if块之后有更多的操作应该由所有线程执行,那么它实际上可以帮助调度程序避免死锁。

Also note, the existence of break which leads the control flow outside of the while loop. If you ommit the break instruction and have some more operations after the if-block that are supposed to be executed by all threads, it may actually help the scheduler avoid the deadlock.

关于您在问题中给出的示例:在CUDA中,明确禁止在SIMD分支代码中放置 __ syncthreads()。编译器不会捕获它,但手册说未定义的行为。实际上,在pre-Fermi设备上,所有 __ syncthreads()被视为相同的障碍。有了这个假设,你的代码实际上会终止而没有错误。虽然 不应该依赖这种行为。

Regarding your example given in the question: in CUDA it is explicitly forbidden to put __syncthreads() in a SIMD-diverging code. The compiler won't catch it but the manual says about "undefined behaviour". In practice, on pre-Fermi devices, all __syncthreads() are seen as the same barriers. With that assumtion, your code would actually terminate without an error. One should not rely on this behaviour though.

这篇关于CUDA / OpenCL中的现实死锁示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆