条件同步线程 &死锁(或不死锁) [英] conditional syncthreads & deadlock (or not)

查看:24
本文介绍了条件同步线程 &死锁(或不死锁)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

跟进 Q:EarlyExitDroppedThreads

根据上面的链接,下面的代码应该是死锁的.
请解释为什么这不会死锁.(费米上的 Cuda 5)

According to the above links, the code below should dead-lock.
Please explain why this does NOT dead-lock. (Cuda 5 on a Fermi)

__device__ int add[144];
__device__ int result;

add<<<1,96>>>();  // the calling 

__global__ void add() {
 for(idx=72>>1; idx>0; idx>>=1) {
  if(thrdIdx < idx) 
   add[thrdIdx]+= add[thrdIdx+idx];
  else
   return;
  __syncthreads();
 }

 if(thrdIdx == 0)
  result= add[0];
}

推荐答案

从技术上讲,这是一个定义不明确的程序.

This is technically an ill-defined program.

大多数,但不是全部(例如 G80 不支持),NVIDIA GPU 支持提前退出这种方式,因为硬件为每个块维护一个活动线程计数,该计数用于屏障同步而不是初始线程计数为块.

Most, but not all (for example G80 does not), NVIDIA GPUs support early exit in this way because the hardware maintains an active thread count for each block, and this count is used for barrier synchronization rather than the initial thread count for the block.

因此,当你的代码中的 __syncthreads() 到达时,硬件不会等待任何已经返回的线程,程序运行不会死锁.

Therefore, when the __syncthreads() in your code is reached, the hardware will not wait on any threads that have already returned, and the program runs without deadlock.

这种风格更常见的用法是:

A more common use of this style is:

__global__ void foo(int n, ...) {
  int idx = threadIdx.x + blockIdx.x * blockDim.x;
  if (idx >= n) return;
  ... // do some computation with remaining threads
}

重要提示:屏障计数是按 Warp 更新的(请参阅 此处),而不是每个线程.因此,您可能会遇到这样的情况,例如,只有少数(或零个)线程提前返回.这意味着屏障计数不会减少.但是,只要每个 warp 中至少有一个线程到达屏障,它就不会死锁.

Important note: barrier counts are updated per-warp (see here), not per-thread. So you may have the case where, say, only a few (or zero) threads return early. This means that the barrier count is not decremented. However, as long as at least one thread from each warp reaches the barrier, it will not deadlock.

因此,一般来说,您需要谨慎使用障碍.但具体来说,像这样的(简单)提前退出模式确实有效.

So in general, you need to use barriers carefully. But specifically, (simple) early exit patterns like this do work.

针对您的具体情况.

迭代 Idx==36:2 个活动 warp,因此屏障退出计数为 64.warp 0 中的所有线程都到达屏障,计数从 0 增加到 32.warp 1 中的 4 个线程到达屏障,计数从 32 增加到 64,并且经线 0 和 1 从屏障中释放.阅读上面的链接以了解为什么会发生这种情况.

Iteration Idx==36: 2 active warps so barrier exit count is 64. All threads from warp 0 reach barrier, incrementing count from 0 to 32. 4 threads from warp 1 reach barrier, incrementing count from 32 to 64, and warps 0 and 1 are released from barrier. Read the link above to understand why this happens.

Iteration Idx==18: 1 个活动 warp,因此屏障退出计数为 32.来自 warp 0 的 18 个线程到达屏障,计数从 0 增加到 32.满足屏障并释放 warp 0.

Iteration Idx==18: 1 active warp so barrier exit count is 32. 18 threads from warp 0 reach barrier, incrementing count from 0 to 32. Barrier is satisfied and warp 0 is released.

等等……

这篇关于条件同步线程 &amp;死锁(或不死锁)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆