条件同步电缆死锁(或不) [英] conditional syncthreads & deadlock (or not)

查看:149
本文介绍了条件同步电缆死锁(或不)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

A追踪Q: EarlyExit DroppedThreads < a>

A follow up Q to: EarlyExit and DroppedThreads

根据上述链接,下面的代码应该死锁。

请解释为什么这不是死锁。 (Fermi上的Cuda 5)

According to the above links, the code below should dead-lock.
Please explain why this does NOT dead-lock. (Cuda 5 on a Fermi)

__device__ int add[144];
__device__ int result;

add<<<1,96>>>();  // the calling 

__global__ void add() {
 for(idx=72>>1; idx>0; idx>>=1) {
  if(thrdIdx < idx) 
   add[thrdIdx]+= add[thrdIdx+idx];
  else
   return;
  __syncthreads();
 }

 if(thrdIdx == 0)
  result= add[0];
}


推荐答案

定义程序。

大多数,但不是全部(例如G80不支持),NVIDIA GPU支持以这种方式提前退出,因为硬件维护每个块的活动线程计数,

Most, but not all (for example G80 does not), NVIDIA GPUs support early exit in this way because the hardware maintains an active thread count for each block, and this count is used for barrier synchronization rather than the initial thread count for the block.

因此,当 __ syncthreads()在你的代码中,硬件不会等待已经返回的任何线程,并且程序运行时没有死锁。

Therefore, when the __syncthreads() in your code is reached, the hardware will not wait on any threads that have already returned, and the program runs without deadlock.

这种风格的更常见的用法是:

A more common use of this style is:

__global__ void foo(int n, ...) {
  int idx = threadIdx.x + blockIdx.x * blockDim.x;
  if (idx >= n) return;
  ... // do some computation with remaining threads
}

重要注意:屏障计数会每次更新(请参阅此处 ),而不是每个线程。所以你可能有这样的情况,例如,只有几个(或零)线程早退。这意味着屏障计数不递减。但是,只要至少有一个线程从每个warp到达障碍,它就不会死锁。

Important note: barrier counts are updated per-warp (see here), not per-thread. So you may have the case where, say, only a few (or zero) threads return early. This means that the barrier count is not decremented. However, as long as at least one thread from each warp reaches the barrier, it will not deadlock.

因此,一般来说,你需要仔细使用障碍。

So in general, you need to use barriers carefully. But specifically, (simple) early exit patterns like this do work.

编辑:针对您的具体情况。

for your specific case.

迭代Idx == 36:2活动翘曲,因此障碍退出计数为64.来自翘曲0的所有线程到达障碍,从0到32递增计数。来自翘曲1的4个线程到达障碍,从32递增计数到64,并且翘曲0和1从屏障释放。阅读上面的链接,了解为什么会发生这种情况。

Iteration Idx==36: 2 active warps so barrier exit count is 64. All threads from warp 0 reach barrier, incrementing count from 0 to 32. 4 threads from warp 1 reach barrier, incrementing count from 32 to 64, and warps 0 and 1 are released from barrier. Read the link above to understand why this happens.

迭代Idx == 18:1主动扭曲,所以障碍出口计数为32.来自扭曲0的18个线程到达障碍,从0到32递增计数。翘曲0被释放。

Iteration Idx==18: 1 active warp so barrier exit count is 32. 18 threads from warp 0 reach barrier, incrementing count from 0 to 32. Barrier is satisfied and warp 0 is released.

Etc ...

这篇关于条件同步电缆死锁(或不)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆