我可以在删除线程后使用__syncthreads()吗? [英] Can I use __syncthreads() after having dropped threads?

查看:143
本文介绍了我可以在删除线程后使用__syncthreads()吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在一个块中使用 __ syncthreads()是安全的,我有意使用 return / p>

文档规定 __ syncthreads() 必须由块中的每个线程调用否则会导致死锁,但实际上我从未遇到过这种行为。



示例代码

  __ global__ void kernel(float * data,size_t size){
//如果用户在内核调用中放置太多,删除多余的线程。
//返回后,有'size`个活动线程。
if(threadIdx.x> = size){
return;
}

// ...做一些工作...

__syncthreads(); //这是安全吗?

//对于内核的其余部分,我们需要删除一个多余的线程
//返回后,有'size-1`个活动线程
if(threadIdx .x + 1 == size){
return;
}

// ...做更多的工作...

__syncthreads(); //这是安全吗?
}


解决方案

没有。指令周围的一个 __ syncthreads()指令将导致死锁并导致内核挂起。您的代码示例不能保证安全或正确。正确的实现代码的方式如下:

  __ global__ void kernel(...)

if(tidx< N){
//代码节#1
}

__syncthreads();


if(tidx< N){
//代码节#2
}

//等
}

,以便 __ syncthreads()






编辑:只需添加一些额外信息,确认此断言, __ syncthreads()调用编译到所有架构上的PTX bar.sync 指令中。 PTX2.0指南(p133)记录 bar.sync 并包含以下警告:


障碍在每次经线基础上执行,好像
warp中的所有线程都是活动的。因此,如果warp中的任何线程执行bar
指令,就好像warp中的所有线程都执行了
bar指令。经线中的所有线程都停止,直到障碍
完成,并且障碍的到达计数增加
经线大小(而不是经线中的活动线程的数量)。在
有条件地执行的代码中,只有当
知道所有线程都相同地评估条件(
warp不发散)时才应该使用bar指令。由于障碍是在每个弯曲的
基础上执行的,所以可选的线程数必须是warp的大小的倍数。


因此,尽管任何断言相反,在$ code> __ syncthreads()调用周围有条件转移是不安全的,除非你可以100%确定每个线程在任何给定的 warp 遵循相同的代码路径,并且不会发生warp warp divergence。


Is it safe to use __syncthreads() in a block where I have purposefully dropped threads using return?

The documentation states that __syncthreads() must be called by every thread in the block or else it will lead to a deadlock, but in practice I have never experienced such behavior.

Sample code:

__global__ void kernel(float* data, size_t size) {
    // Drop excess threads if user put too many in kernel call.
    // After the return, there are `size` active threads.
    if (threadIdx.x >= size) {
        return;
    }

    // ... do some work ...

    __syncthreads(); // Is this safe?

    // For the rest of the kernel, we need to drop one excess thread
    // After the return, there are `size - 1` active threads
    if (threadIdx.x + 1 == size) {
        return;
    }

     // ... do more work ...

    __syncthreads(); // Is this safe?
}

解决方案

The answer to the short question is "No". Warp level branch divergence around a __syncthreads() instruction will cause a deadlock and result in a kernel hang. Your code example is not guaranteed to be safe or correct. The correct way to implement the code would be like this:

__global__ void kernel(...)

    if (tidx < N) {
        // Code stanza #1
    }

    __syncthreads();


    if (tidx < N) {
        // Code stanza #2
    }

    // etc
}

so that the __syncthreads() instructions are executed unconditionally.


EDIT: Just to add a bit of additional information which confirms this assertion, __syncthreads() calls get compiled into the PTX bar.sync instruction on all architectures. The PTX2.0 guide (p133) documents bar.sync and includes the following warning:

Barriers are executed on a per-warp basis as if all the threads in a warp are active. Thus, if any thread in a warp executes a bar instruction, it is as if all the threads in the warp have executed the bar instruction. All threads in the warp are stalled until the barrier completes, and the arrival count for the barrier is incremented by the warp size (not the number of active threads in the warp). In conditionally executed code, a bar instruction should only be used if it is known that all threads evaluate the condition identically (the warp does not diverge). Since barriers are executed on a per-warp basis, the optional thread count must be a multiple of the warp size.

So despite any assertions to the contrary, it is not safe to have conditional branching around a __syncthreads() call unless you can be 100% certain that every thread in any given warp follows the same code path and no warp divergence can occur.

这篇关于我可以在删除线程后使用__syncthreads()吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆